提供者:杜成玉
下载地址:http://www.openslr.org/12/
概述
数据来源:https://www.zhihu.com/question/63383992/answer/222718972
该数据集为包含文本和语音的有声读物数据集,由Vassil Panayotov编写的大约1000小时的16kHz读取英语演讲的语料库。数据来源于LibriVox项目的阅读有声读物,并经过细致的细分和一致。推荐应用方向:自然语音理解和分析挖掘
相关论文
[1]Panayotov V, Chen G, Povey D, et al. Librispeech: an ASR corpus based on public domain audio books[C]//Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015: 5206-5210.
[2]Amodei D, Ananthanarayanan S, Anubhai R, et al. Deep speech 2: End-to-end speech recognition in english and mandarin[C]//International Conference on Machine Learning. 2016: 173-182.
[3]Ko T, Peddinti V, Povey D, et al. Audio augmentation for speech recognition[C]//Sixteenth Annual Conference of the International Speech Communication Association. 2015.
[4]Soltau H, Liao H, Sak H. Neural speech recognizer: Acoustic-to-word LSTM model for large vocabulary speech recognition[J]. arXiv preprint arXiv:1610.09975, 2016.
[5]Chung Y A, Wu C C, Shen C H, et al. Audio word2vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder[J]. arXiv preprint arXiv:1603.00982, 2016.