The Tacotron algorithm is data-driven. We needed to assemble a corpus to train the algorithm to produce a model.
Technical considerations:
- Audio/Text aligned data;
- No noise;
- No background music;
- Transcription is time-consuming;
- Recordings from one speaker;
- Mono 16-bit 48mHz files.
Linguistic considerations:
- Choose a speech variety;
- Compile a textual corpus with homogenous orthography;
- Get in contact with potential ‘speech donors’;
- Carry out short interviews to ensure that they fit the desired profile.