{"id":67,"date":"2020-08-04T08:40:21","date_gmt":"2020-08-04T08:40:21","guid":{"rendered":"http:\/\/groningserobot.webhosting.rug.nl\/?page_id=67"},"modified":"2020-08-31T15:46:21","modified_gmt":"2020-08-31T13:46:21","slug":"corpus-planning","status":"publish","type":"page","link":"https:\/\/groningserobot.webhosting.rug.nl\/?page_id=67","title":{"rendered":"Assembling Corpora"},"content":{"rendered":"\n<p>The Tacotron algorithm is data-driven. We needed to assemble a corpus to train the algorithm to produce a model. <\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container\">\n<p><strong>Technical considerations:<\/strong><\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container\">\n<ul><li>Audio\/Text aligned data;<\/li><li>No noise;<\/li><li>No background music; <\/li><li>Transcription is time-consuming; <\/li><li>Recordings from one speaker; <\/li><li>Mono 16-bit 48mHz files. <\/li><\/ul>\n\n\n\n<p><strong>Linguistic considerations:<\/strong><\/p>\n\n\n\n<ul><li>Choose a speech variety; <\/li><li>Compile a textual corpus with homogenous orthography; <\/li><li>Get in contact with potential &#8216;speech donors&#8217;; <\/li><li>Carry out short interviews to ensure that they fit the desired profile. <\/li><\/ul>\n<\/div><\/div>\n<\/div><\/div>\n\n\n\n<figure class=\"wp-block-embed-wordpress wp-block-embed is-type-wp-embed is-provider-a-grunnegs-speaking-robot\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"VTcKuz3Vch\"><a href=\"https:\/\/groningserobot.webhosting.rug.nl\/?page_id=203\">Our own experience<\/a><\/blockquote><iframe class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;Our own experience&#8221; &#8212; A Grunnegs-speaking Robot\" src=\"https:\/\/groningserobot.webhosting.rug.nl\/?page_id=203&#038;embed=true#?secret=VTcKuz3Vch\" data-secret=\"VTcKuz3Vch\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The Tacotron algorithm is data-driven. We needed to assemble a corpus to train the algorithm to produce a model.<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=\/wp\/v2\/pages\/67"}],"collection":[{"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=67"}],"version-history":[{"count":9,"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=\/wp\/v2\/pages\/67\/revisions"}],"predecessor-version":[{"id":210,"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=\/wp\/v2\/pages\/67\/revisions\/210"}],"wp:attachment":[{"href":"https:\/\/groningserobot.webhosting.rug.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=67"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}