IcePaHC

提供者:朱述承

简介

冰岛语解析历史语料库(IcePaHC)是一个历史语料库,具有从12世纪到现代所有时期的冰岛语书面语样本。该语料库大多与UPenn开发的历史英语语料库兼容。对于历史文本来说,这里的现代化拼写是为了适应音位变化。

下载地址

http://www.linguist.is/icelandic_treebank/Download

第9版内容

共计1,002,390词
1150: Fyrsta málfræðiritgerðin (The First Grammatical Treatise) (4422 words)
1150: Íslensk hómilíubók (Icelandic book of homilies) (40943 words)
1210: Jarteinabók (10328 words)
1210: Þorláks saga helga (10868 words)
1250: Íslendinga saga (22805 words)
1250: Þetubrot Egils Sögu (Theta manuscript of Egils Saga) (3461 words)
1260: Jómsvíkinga saga (21133 words)
1270: Grágás. Lagasafn íslenska þjóðveldisins. (6203 words)
1275: Morkinskinna (25064 words)
1300: Alexanders saga (23356 words)
1310: Grettis saga Ásmundarsonar (20563 words)
1325: Árna saga biskups (19968 words)
1350: Bandamanna saga (Möðruvallabók text) (13618 words)
1350: Finnboga saga ramma (23036 words)
1350: Mörtu saga og Maríu Magdalenu (17241 words)
1400: Gunnars saga Keldugnúpsfífls (8770 words)
1400: Gunnars saga Keldugnúpsfífls - Part 2 (3164 words)
1400: Víglundar saga (13453 words)
1450: Bandamanna saga (Konungsbók text) (11560 words)
1450: Ectors saga (21063 words)
1450: Júditarbók (6562 words)
1450: Vilhjálms saga Sjóðs (23132 words)
1475: Miðaldaævintýri (18084 words)
1480: Jarlmanns saga og Hermanns (14482 words)
1525: Erasmus saga (Reykjahólabók) (8589 words)
1525: Georgíus saga (Reykjahólabók) (20092 words)
1540: Nýja Testamenti Odds Gottskálkssonar (The New Testament of Oddur Gottskálksson), Postulanna Gjörningar (Acts of the Apostles) (16550 words)
1540: Nýja Testamenti Odds Gottskálkssonar (The New Testament of Oddur Gottskálksson), S. Jóhannis Guðspjöll (Gospel of St. John) (20925 words)
1593: Eintal sálarinnar við sjálfa sig (23327 words)
1611: Okur (15481 words)
1628: Reisubók séra Ólafs Egilssonar (17199 words)
1630: Fimmtíu heilagar hugvekjur Meditationes sacrae (12698 words)
1650: Illuga saga Tagldarbana (20921 words)
1659: Píslarsaga séra Jóns Magnússonar (9825 words)
1661: Reisubók Jóns Ólafssonar Indíafara (23031 words)
1675: Móðars þáttur (3845 words)
1675: Söguþáttur af Ármanni og Þorsteini gála (11228 words)
1675: Um ætt Magnúsar Jónssonar (3187 words)
1680: Sögu-þáttur um Skálholts biskupa fyrir og um siðaskiptin. (10281 words)
1720: Vídalínspostilla (23016 words)
1725: Biskupasögur Jóns prófasts Halldórssonar í Hítardal (22297 words)
1745: Nikulás Klím (22038 words)
1790: Fimmbræðra saga (18860 words)
1791: Ævisaga síra Jóns Steingrímssonar (22369 words)
1830: Hellismanna saga (14988 words)
1835: Um eðli og uppruna jarðarinnar (On the Nature and Origin of the Earth) (3257 words)
1850: Piltur og stúlka (17844 words)
1859: Fimtíu hugvekjur út af pínu og dauða Drottins vors Jesú Krists (20530 words)
1861: Sagan af Heljarslóðarorrustu (20336 words)
1882: Brynjólfur Sveinsson biskup (27342 words)
1883: Hans Vöggur (1927 words)
1888: Grímur kaupmaður deyr (7241 words)
1888: Vordraumur (10753 words)
1902: Upp við fossa (20647 words)
1907: Leysing (20613 words)
1908: Ofurefli (20262 words)
1920: Árin og eilífðin. Prédikanir eftir Harald Níelsson (21234 words)
1985: Margsaga (22295 words)
1985: Sagan öll (20980 words)
2008: Ofsi (21144 words)
2008: Segðu mömmu að mér líði vel - saga um ástir - (21958 words)

使用说明

如果您使用Windows,最简单的方法就是下载IcePaHC for Windows并按照屏幕上的说明进行操作。适用于Windows的IcePaHC使用CorpusSearch运行查询,因此除了此网页外,还请阅读CorpusSearch文档。如果您使用IcePaHC for Windows,则无需输入启动程序的命令,只需单击桌面上的IcePaHC图标即可。如果您没有安装Java,安装将引导您进入Java下载页面。

由于语料库使用标记的包围格式,因此它与采用这种注释的程序兼容。我们推荐使用由UPenn的Beth Randall开发的CorpusSearch程序。如果您已将语料库复制到目录“/ home / chomsky / icepahc”并将CorpusSearch jar文件保存在“/ home / chomsky / corpussearch”中,则可以使用以下命令来使用语料库中的查询来搜索语料库名为datsubj.q的文本文件。

java -classpath /home/chomsky/corpussearch/CS_2.002.75.jar csearch/CorpusSearch datsubj.q /home/chomsky/icepahc/*.psd

让我们假设datsubj.q是一个查询,它挑选出所有的和主语。该文件可能如下所示:

node: IP*

query: (IP idoms NP-SBJ) AND (NP-SBJ idoms -D)

果使用这样的文件运行上面的命令,CorpusSearch将返回一个名为datsubj.out的文件,其语料库中的所有语句都包含配词主题。阅读语料库的CorpusSearch文档和注释准则,了解如何做更多。

请注意,将会有方法通过创建别名等来简化命令,但这在不同的操作系统上会有所不同。阅读CorpusSearch文档入门以获取更多信息。