Assembling Corpora

The Tacotron algorithm is data-driven. We needed to assemble a corpus to train the algorithm to produce a model.

Technical considerations:

  • Audio/Text aligned data;
  • No noise;
  • No background music;
  • Transcription is time-consuming;
  • Recordings from one speaker;
  • Mono 16-bit 48mHz files.

Linguistic considerations:

  • Choose a speech variety;
  • Compile a textual corpus with homogenous orthography;
  • Get in contact with potential ‘speech donors’;
  • Carry out short interviews to ensure that they fit the desired profile.