Estonian Reference Corpus

提供者:朱述承
下载地址:http://www.cl.ut.ee/korpused/segakorpus/slohtuleht/

内容

这个语料库包含:
Fiction from the year 1990 onwards (5,6 million words);
Daily Postimees (issues 27.11.1995-10.10.2000; 1760 issues containing 88 600 articles, 32.9 million words);
Weekly Eesti Ekspress (issues 09.08.1996-29.11.2001; 7.5 million words);
Daily Eesti Päevaleht (issues 18.10.1995-31.10.2007; (4,065 issues containing 366,862 articles); 87.9 million words);
Magazine Maaleht (2001-2004; 4.3 million words);
Magazine SL Õhtuleht (1997-2007; 45.5 million words);
Valgamaalane (02.09.2004-31.07.2008; 2.5 million word);
Lääne Elu (04.05.2000-01.11.2008; 1.8 million words);
Magazine Horisont (1996-2003; 260,000 words);
Magazine Luup (1996-2002; 1,9 million words);
Magazine Kroonika (2001-2003; 600,000 words);
Magazine Eesti Arst (2002-2004; ca 0,7 million words);
Magazine Arvutitehnika ja Andmetöötlus (1999-2005; 625,000 words);
Magazine Agraarteadus (2001-2006; 298,000 words);
Various cientific articles (ca 1.3 million words);
Estonian and European legal documents (ca 1.8 million and 10 million words);
New media (ca 21 million words);
Parliament transcripts 1995-2001 (13 million words);
PhD dissertations (2.3 million words).

使用方法

该语料库可免费用于非商业用途。人们可以:
下载压缩文本;
使用Keeleveeb的语料库查询来检索引文,词类和语法类别或其共同出现的一致性。
可以从每个子语料库的描述中找到文本。一些subcorpora不能被下载。这些可以通过语料库查询来使用。