文本分类语料库(复旦)测试数据集

提供者:卢梦依
下载地址:http://www.nlpir.org/download/tc-corpus-answer.rar

简介

数据集概述

由复旦大学李荣陆提供。answer.rar为测试语料,共9833篇文档;train.rar为训练语料,共9804篇文档,分为20个类别。训练语料和测试语料基本按照1:1的比例来划分。收集工作花费了不少人力和物力,所以请大家在使用时尽量注明来源(复旦大学计算机信息与技术系国际数据库中心自然语言处理小组)。

文件

文件较大(训练测试各50多兆)。

相关论文

1.Joachims T. Transductive Inference for Text Classification using Support Vector Machines[C]// Sixteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc. 1999:200-209.
2.Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[J]. 2016:427-431.
3.Zhang Y, Wallace B. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification[J]. Computer Science, 2015.
4.Ji Y L, Dernoncourt F. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks[J]. 2016:515-520.
5.Chen G, Ye D, Xing Z, et al. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization[C]// International Joint Conference on Neural Networks. IEEE, 2017:2377-2383.