-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b37d394
commit 3ab46a3
Showing
98 changed files
with
1,391 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "XV century New Testament translations (Piętnastowieczne przekłady Nowego Testamentu – elektroniczna konkordancja staropolska)", | ||
"URL": "http://stnt.ijp.pan.pl/", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains Biblical texts from 1380 to 1500.\nThis corpus is available through a dedicated concordancer.", | ||
"Languages": ["pol","lat"], | ||
"License": "", | ||
"Size": ["400,000 tokens"], | ||
"Annotation": ["tokenised"], | ||
"Access": { | ||
"Concordancer": "http://stnt.ijp.pan.pl/idxlac/index" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "The Electronic Corpus of 17th- and 18th-century Polish Texts (Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w.)", | ||
"URL": "https://www.korba.edu.pl/query_corpus/", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from 1601 to 1772.\nThe corpus is available through a dedicated concordancer.\nA manually annotated subset is available <a href=\"https://korba.edu.pl/download\">here</a>.", | ||
"Languages": ["pol"], | ||
"License": "", | ||
"Size": ["13.5 million tokens"], | ||
"Annotation": ["tokenised", "partially PoS-tagged", "structural annotation"], | ||
"Access": { | ||
"Concordancer": "https://korba.edu.pl/query_corpus/" | ||
}, | ||
"Publication": "Gruszczyński et al. (2021)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus of the 19. century Polish (Korpus polszczyzny XIX-wiecznej)", | ||
"URL": "http://korpus19.nlp.ipipan.waw.pl/", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from 1830 to 1918.\nThe corpus is available for download through a dedicated webpage.", | ||
"Languages": ["pol"], | ||
"License": "", | ||
"Size": ["625,000 tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged", "lemmatised", "transliteration", "transcription"], | ||
"Access": { | ||
"Download": "http://korpus19.nlp.ipipan.waw.pl/static/korpus19-TEI-XML-02102018.tar.gz" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "The Morpho-Syntactic Database of Mikael Agricola's Works", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-20140730170", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from 1544 to 1551 written by the clergyman <a href=\"https://en.wikipedia.org/wiki/Mikael_Agricola\">Mikael Agricola</a>.\nThe corpus is available through the concordancer Korp.", | ||
"Languages": ["fin"], | ||
"License": "CC-BY-ND", | ||
"Size": ["428,300 tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged", "morphological components and syntactic function"], | ||
"Access": { | ||
"Concordancer": "http://urn.fi/urn:nbn:fi:lb-201407166" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Aleksis Kivi Corpus (SKS)", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-201405274", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains the works by Finnish author Aleksis Kivi from 1855 to 1871.\nThe corpus is available through the concordancer Korp.", | ||
"Languages": ["fin", "swe"], | ||
"License": "CC-BY-NC", | ||
"Size": ["413,700 words"], | ||
"Annotation": ["MSD-tagged", "syntactically parsed"], | ||
"Access": { | ||
"Concordancer": "http://urn.fi/urn:nbn:fi:lb-201405273" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Open Richly Annotated Cuneiform Corpus, Korp Version", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2018071121", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains cuneiform texts from Ancient history.\nThe texts come from the <a href=\"http://oracc.museum.upenn.edu/projectlist.html\">Oracc project</a> and include collections such as the Corpus of Ancient Mesopotamian Scholarship, The Digital Corpus of Cuneiform Lexical Texts, and Royal Inscriptions of Babylonia online.\nThe corpus is available through the concordancer Korp and for download from the repository of FIN-CLARIN.", | ||
"Languages": ["akk"], | ||
"License": "CC-BY-SA", | ||
"Size": ["1,600,563 tokens"], | ||
"Annotation": ["tokenised", "lemmatised", "PoS-tagged", "semantically annotated"], | ||
"Access": { | ||
"Concordancer": "http://urn.fi/urn:nbn:fi:lb-2019060601" | ||
"Download": "http://urn.fi/urn:nbn:fi:lb-2019111602" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Anthology of Middle English texts / Santiago Gonzalez y Fernandez-Corugedo", | ||
"URL": "http://hdl.handle.net/20.500.14106/1398", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains literary texts from 1100 to 1400.\nThe corpus is available for download from the Oxford Text Archive.", | ||
"Languages": ["enm","heb"], | ||
"License": "Oxford Text Archive licence", | ||
"Size": ["4000 words"], | ||
"Annotation": ["no linguistic annotation"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.14106/1398" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "ARCHER Corpus", | ||
"URL": "http://www.projects.alc.manchester.ac.uk/archer/", | ||
"Family": "Historical corpora", | ||
"Description": "The corpus contains texts from 1600 to 1999.\nThe corpus is available through the CQPConcordancer. ", | ||
"Languages": ["eng"], | ||
"License": "", | ||
"Size": [], | ||
"Annotation": [], | ||
"Access": { | ||
"Concordancer": "https://cqpweb.lancs.ac.uk/archer_untagged/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Austrian Baroque Corpus", | ||
"URL": "https://acdh.oeaw.ac.at/abacus/", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains sermons from 1650 to 1750.\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["deu"], | ||
"License": "", | ||
"Size": ["200,000 tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged", "lemmatised", "named entities"], | ||
"Access": { | ||
"Concordancer": "https://acdh.oeaw.ac.at/abacus/corpus.html" | ||
}, | ||
"Publication": "Resch et al. (2016)." | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "B4 Historisches Predigtenkorpus zum Nachfeld", | ||
"URL": "http://hdl.handle.net/11022/0000-0000-9B23-A", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains sermons from an Upper German (Balvarian-Alemannic) dialect area.\nThe corpus is available for download from the repository of the University of Hamburg and through the ANNIS environment.", | ||
"Languages": ["gmh"], | ||
"License": "CLARIN ACA", | ||
"Size": ["92,500 tokens"], | ||
"Annotation": ["tokenised", "syntactic and discursive annotation"], | ||
"Access": { | ||
"Concordancer": "http://annis.corpora.uni-hamburg.de:8080/gui/sfb632" | ||
"Download": "http://hdl.handle.net/11022/0000-0000-9B23-A" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "B4 Ludolf", | ||
"URL": "http://hdl.handle.net/11022/0000-0000-9B22-B", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from a journey diary from 1350.\nThe corpus is available for download from the repository of the University of Hamburg and through the ANNIS environment.", | ||
"Languages": ["gmh"], | ||
"License": "CLARIN ACA", | ||
"Size": ["6,690 tokens"], | ||
"Annotation": ["tokenised", "tagged for clause type and grammatical function"], | ||
"Access": { | ||
"Concordancer": "http://annis.corpora.uni-hamburg.de:8080/gui/sfb632" | ||
"Download": "http://hdl.handle.net/11022/0000-0000-9B22-B" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "B4 Tatian Corpus of Deviating Examples 2.1", | ||
"URL": "http://hdl.handle.net/11022/0000-0000-9B1E-1", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains the OHG Tatian, which is one of the largest prose texts from the Old High German period.\nThe corpus is available for download and through a concordancer from the repository of the University of Hamburg.", | ||
"Languages": ["lat", "goh"], | ||
"License": "CC-BY", | ||
"Size": ["11,300 tokens"], | ||
"Annotation": ["tokenised", "MSD-tagged"], | ||
"Access": { | ||
"Concordancer": "http://annis.corpora.uni-hamburg.de:8080/gui/sfb632" | ||
"Download": "http://hdl.handle.net/11022/0000-0000-9B1E-1" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus of biblical text in Scots / John Kirk", | ||
"URL": "http://hdl.handle.net/20.500.14106/1713", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains Biblical texts.\nThe corpus is available for download from the Oxford Text Archive.", | ||
"Languages": ["sco"], | ||
"License": "Oxford Text Archive licence", | ||
"Size": ["35,506 words"], | ||
"Annotation": ["no annotation"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.14106/1713" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Brieven als buit (Letters as loot)", | ||
"URL": "http://hdl.handle.net/10032/f6d68fed217ef7364a32c431396ac465", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains 40,000 letters from the 17th to the 19th century.\nThese letters were sent home by sailors and others from abroad but also vice versa by those staying behind who needed to keep in touch with their loved ones. Many letters did not reach their destinations: they were taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["nld"], | ||
"License": "CLARIN PUB", | ||
"Size": ["460,000 words"], | ||
"Annotation": ["lemmatised", "PoS-tagged", "grammatically tagged"], | ||
"Access": { | ||
"Concordancer": "http://brievenalsbuit.inl.nl/zeebrieven/page/search" | ||
}, | ||
"Publication": "Rutten and van der Wal (2014)." | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Bundesblatt/Feuille fédérale/Foglio federale", | ||
"URL": "https://feuille-federale.unige.ch/", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from 1849 to 2014.\nThe corpus is available through the CQPWeb concordancer.", | ||
"Languages": ["deu","fra","ita"], | ||
"License": "", | ||
"Size": ["203,585,806 tokens (German)", "239,125,036 tokens (French)", "85,223,085 tokens (Italian)"], | ||
"Annotation": ["tokenised", "syntactically-parsed"], | ||
"Access": { | ||
"Concordancer": "https://feuille-federale.unige.ch/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Carniolan Provincial Assembly corpus Kranjska 1.0", | ||
"URL": "http://hdl.handle.net/11356/1824", | ||
"Family": "Historical corpora", | ||
"Description": "The corpus contains meeting proceedings of 694 sessions of the Carniolan Provincial Assembly from 1861 to 1913.\nThe source data (scanned and OCR processed pdf documents) originally come from <a href=\"http://www.dlib.si\">The Digital Library of Slovenia dLib.si</a> and <a href=\"https://www.sistory.si\">History of Slovenia - SIstory</a> portals. The documents are bilingual, in Slovenian and German, depending on the speaker. German was first typeset in the Gothic script and later on in Latin.\nThe documents were automatically processed and the following data extracted: titles, agenda, attending, start and end of the session, speakers, and comments. Language was detected on the sentence level, roughly 58% sentences are in Slovenian and 42% in German. Linguistic annotation (tokenisation, MSD tagging and lemmatisation) was added using <a href=\"https://github.com/nlp-uoregon/trankit\">Trankit</a> for Slovenian and German, while <a href=\"https://github.com/pemistahl/lingua-py\">Lingua</a> is used for language detection.\nThe documents are in the <a href=\"https://github.com/clarin-eric/parla-clarin\">Parla-CLARIN</a> compliant TEI XML format. Each session in one file.", | ||
"Languages": ["deu", "slv"], | ||
"License": "CC-BY 4.0", | ||
"Size": ["10.9 million words"], | ||
"Annotation": ["tokenised", "MSD-tagged", "lemmatised"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1824" | ||
}, | ||
"Publication": "Marolt et al. (2023)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "A Corpus of English Dialogues 1560-1760 (CED)", | ||
"URL": "http://hdl.handle.net/20.500.14106/2507", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains dialogues from literary and didactic works from 1560 to 1760.\n There are five text-types in the CED. The text-types representative of constructed dialogue are drama comedy, didactic works (language manuals and other handbooks) and fiction; the text-types representative of authentic dialogue are trial proceedings and witness depositions. In addition, a small group of miscellaneous dialogic texts is included in the collection.\nThe corpus is available for download from the Oxford Text Archive.", | ||
"Languages": ["eng"], | ||
"License": "Oxford Text Archive licence", | ||
"Size": ["1.2 million words"], | ||
"Annotation": ["no annotation"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.14106/2507" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus of Early English Correspondence Sampler (CEECS)", | ||
"URL": "http://hdl.handle.net/20.500.14106/2461", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains 1147 letters from 1418 to 1680.\nThe corpus was created from the larger <a href=\"https://www.helsinki.fi/en/researchgroups/variation-contacts-and-change-in-english/research/corpus-of-early-english-correspondence\">Corpus of Early English Correspondence</a>.\nThe corpus is available for download from the Oxford Text Archive.", | ||
"Languages": ["eng"], | ||
"License": "Oxford Text Archive licence", | ||
"Size": ["450,000 words"], | ||
"Annotation": ["no annotation"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.14106/2461" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "ChroniclItaly", | ||
"URL": "https://hdl.handle.net/10.24416/uu01-t4ymow", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains Italian language newspapers published in the United States between 1898 and 1920. The corpus includes seven Italian language newspapers published in California, Massachusetts, Pennsylvania, Vermont, and West Virginia. The collection includes the following titles: L’Italia, Cronaca sovversiva, La libera parola, The patriot, La ragione, La rassegna, and La sentinella del West Virginia.\nThe corpus is available for download from the repository of the University of Utrecht.", | ||
"Languages": ["ita"], | ||
"License": "ODC Attribution License (ODC-By)", | ||
"Size": ["16.6 million words"], | ||
"Annotation": ["unannotated"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/10.24416/uu01-t4ymow" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Chronopress", | ||
"URL": "http://hdl.handle.net/11321/260", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains newspaper articles from 1945 to 1954.\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["pol"], | ||
"License": "CC-BY-SA", | ||
"Size": ["16 million tokens"], | ||
"Annotation": [], | ||
"Access": { | ||
"Concordancer": "http://chronopress.clarin-pl.eu/#!start" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus Informatizado do Português Medieval", | ||
"URL": "http://cipm.fcsh.unl.pt/", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from the 9th to the 16th century.\nThe corpus is available through a dedicated concordancer (restricted access).", | ||
"Languages": ["por"], | ||
"License": "", | ||
"Size": ["2 million tokens"], | ||
"Annotation": ["tokenised", "PoS-tagged"], | ||
"Access": { | ||
"Concordancer": "http://cipm.fcsh.unl.pt/login.jsp" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Classics Library of the National Library of Finland - Kielipankki version", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2018051701", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus will contain literary texts from 1549 to 1944.", | ||
"Languages": ["fin", "swe"], | ||
"License": "CC-BY", | ||
"Size": [], | ||
"Annotation": [], | ||
"Access": { | ||
"Download": "" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "DDR-Presseportal (GDR press portal)", | ||
"URL": "https://clarin.bbaw.de/en/corpus/", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains newspaper texts from 1945 to 1994.\nThe corpus is available through a concordancer provided by CLARIN-D.", | ||
"Languages": ["deu"], | ||
"License": "", | ||
"Size": [], | ||
"Annotation": [], | ||
"Access": { | ||
"Concordancer": "http://zefys.staatsbibliothek-berlin.de/ddr-presse/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "DiaCORIS", | ||
"URL": "http://corpora.dslo.unibo.it/coris_ita.html", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from 1861 to 1945.\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["ita"], | ||
"License": "", | ||
"Size": [], | ||
"Annotation": [], | ||
"Access": { | ||
"Concordancer": "http://corpora.dslo.unibo.it/DiaCORIS/" | ||
}, | ||
"Publication": "Rossini Favretti et al. (2011)." | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "DIAKORP v6", | ||
"URL": "http://wiki.korpus.cz/doku.php/en:cnk:diakorp", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains texts from the 14th to the 20th century.\nThe corpus is available through a dedicated concordancer.", | ||
"Languages": ["ces"], | ||
"License": "CC-BY-NC-SA", | ||
"Size": ["4 million tokens"], | ||
"Annotation": ["basic structural markup"], | ||
"Access": { | ||
"Concordancer": "https://www.korpus.cz/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Digital library and corpus of historical Slovene IMP 1.1", | ||
"URL": "http://hdl.handle.net/11356/1031", | ||
"Family": "Historical corpora", | ||
"Description": "This corpus contains 658 unique texts from 1584 to 1919.\nThe corpus is available for download from the CLARIN.SI repository and through the concordancer KonText.", | ||
"Languages": ["slv"], | ||
"License": "CC-BY-SA 4.0", | ||
"Size": ["17.7 million tokens"], | ||
"Annotation": ["tokenised", "lemmatised", "PoS-tagged"], | ||
"Access": { | ||
"Concordancer": "https://www.clarin.si/kontext/first_form?corpname=imp" | ||
"Download": "http://hdl.handle.net/11356/1031" | ||
}, | ||
"Publication": "Erjavec (2015)." | ||
} |
Oops, something went wrong.