NAV Navbar
csv data
  • Introduction
  • Introduction

    Thank you for choosing VocaliD to meet your custom voice needs. To ensure that we can provide you with the highest quality voice, we request that the data you provide to us meet or exceed the following specifications:


    Ideally we would like 2000+ recorded sentences of the target speaker totally to at least 2 hours of audio. This audio should be of the target speaker alone with other speakers and sounds removed.

    Each recording should be between 1 and 15 seconds, but it is best if most of the recordings are between 3 and 10 seconds. If a sentence is too long it can be divided into multiple recordings at a logical break in the sentence such as a pause between phrases.

    Audio Quality

    The audio should be clean and be free of:


    An accurate transcript should be provided for each audio file in a comma (“,”) separated spreadsheet form (i.e. .csv) named data_mapping.csv, with the file names in one column, the associated transcript in a second column and the source name or link in the third column. Header strings are required to match as described below.

    (See below for an example of the csv file format and content.)

    Header names and column values

    Column name Description
    filepath The filepath of audio file relative to the csv file.
    transcript The text transcript for the audio file.
    source_identifier A string identifier to group audio files from the same source, recorded under the same conditions. For example a source name, link or arbitrary string. If all files are from the same source then this value can be left blank for each row. In any case the header is stil required.
    recordings/source_a/Your_audio_sample_001.wav,This is a sample one of a transcript.,A
    recordings/source_a/Your_audio_sample_002.wav,This this is the second example of a transcript and includes a repeated word.,A
    recordings/source_b/Your_audio_sample_003.wav,"Wow, um, so many transcripts.",B
    recordings/source_c/Your_audio_sample_004.wav,"""Thank you"", said Doctor Doolittle.",C

    Example data

    filepath transcript source_identifier
    recordings/source_a/audio_sample_001.wav This is a sample one of a transcript. A
    recordings/source_a/audio_sample_002.wav This this is the second example of a transcript and includes a repeated word. A
    recordings/source_b/audio_sample_003.wav Wow, um, so many transcripts. B
    recordings/source_c/audio_sample_004.wav "Thank you", said Doctor Doolittle. C


    The csv file and the audio files should be uploaded to VocaliD together as a zipped file. The file paths should be relative to the spreadsheet’s location. The filesize limit for uploaded zip files is 1 Gb.

    ├── data_mapping.csv

    ├── recordings

    │ ├── source_a

    │ │ ├── audio_sample_001.wav

    │ │ ├── audio_sample_002.wav

    │ ├── source_b

    │ │ ├── audio_sample_003.wav

    │ ├── source_c

    │ │ ├── audio_sample_004.wav