SQuAD The Stanford Question Answering Dataset 问答数据集

提供者:卢梦依
下载地址:https://rajpurkar.github.io/SQuAD-explorer/

简介

数据集概述

斯坦福问题回答数据集(SQuAD)是一种新的阅读理解数据集,由一组维基百科文章的工作者提出的问题组成,
其中每个问题的答案都是从相应阅读段落中截取的一段文字。
在500+的文章中,有100,000+的问题-答案对,SQuAD显着大于以前的阅读理解数据集。

文件大小

训练集30M
验证集5M

数量

约30,000,000个句子及其翻译

相关论文

1.Rajpurkar P, Zhang J, Lopyrev K, et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text[J]. 2016:2383-2392.
2.Wang Z, Mi H, Hamza W, et al. Multi-Perspective Context Matching for Machine Comprehension[J]. 2016.
3.Kim S, Park D, Choi Y, et al. A Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis.[J]. Jmir Medical Informatics, 2018, 6(1):e2.
4.Reutebuch C K, Zein F E, Min K K, et al. Investigating a reading comprehension intervention for high school students with autism spectrum disorder: A pilot study[J]. Research in Autism Spectrum Disorders, 2015, 9:96-111.
5.Yin W, Ebert S, Schütze H. Attention-Based Convolutional Neural Network for Machine Comprehension[J]. 2016.
6.Cui Y, Chen Z, Wei S, et al. Attention-over-Attention Neural Networks for Reading Comprehension[C]// Meeting of the Association for Computational Linguistics. 2017:593-602.