Assembling Corpora

The Tacotron algorithm is data-driven. We needed to assemble a corpus to train the algorithm to produce a model.

Technical considerations:

Audio/Text aligned data;
No noise;
No background music;
Transcription is time-consuming;
Recordings from one speaker;
Mono 16-bit 48mHz files.

Linguistic considerations:

Choose a speech variety;
Compile a textual corpus with homogenous orthography;
Get in contact with potential ‘speech donors’;
Carry out short interviews to ensure that they fit the desired profile.

Our own experience