-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
459332c
commit 4338d14
Showing
31 changed files
with
442 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus of academic Lithuanian", | ||
"URL": "http://coralit.lt/en/node/18", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains textbooks, scientific monographs, journal articles, abstracts, forewords, research reports, and master’s and PhD theses from the following disciplines:\n<ul><li>humanities (architecture, fine art studies, ethnology, folklore studies, philosophy, linguistics, literary theory, librarianship, history, theology),</li><li>social sciences (law, political science,\neconomics, psychology, education, management),</li><li>physical sciences (mathematics, astronomy, physics, chemistry, geography, geology and mineralogy, informatics),</li><li>biomedical sciences (medicine, dental surgery, biology, botany, agronomy, animal husbandry, pharmacy, veterinary science, forestry studies), and</li><li>technological sciences (energy studies, chemical technology, materials science, mechanics, metrology, building construction, transport technology, agricultural and\nenvironmental sciences, management and informatics).</li></ul>The materials were published between 1999 and 2009. The corpus is encoded in TEI 5.\nThe corpus is available for online querying through a dedicated website.", | ||
"Languages": ["lit"], | ||
"License": "", | ||
"Size": ["9 million words"], | ||
"Annotation": ["no linguistic annotation"], | ||
"Access": { | ||
"Concordancer": "http://coralit.lt/en/node/18" | ||
}, | ||
"Publication": "Usonienė and Linkevičienė (2009)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Academic texts - humanities", | ||
"URL": "http://hdl.handle.net/10794/49", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains academic texts from humanities disciplines published between 1997 and 2012. The corpus data are in the XML format and plain text.\nThe corpus is available for download from the SWECLARIN repository and for online querying through the concordancer Korp (SWECLARIN distribution).", | ||
"Languages": ["swe"], | ||
"License": "CC BY", | ||
"Size": ["14.5 million tokens"], | ||
"Annotation": [], | ||
"Access": { | ||
"Concordancer": "https://spraakbanken.gu.se/korp/?corpus=sweachum" | ||
"Download": "http://hdl.handle.net/10794/49" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Academic texts - social science", | ||
"URL": "http://hdl.handle.net/10794/50", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains academic texts from social sciences disciplines published between 1997 and 2012. The corpus data are in the XML format and plain text.\nThe corpus is available for download from the SWECLARIN repository and for online querying through the concordancer Korp (SWECLARIN distribution).", | ||
"Languages": ["swe"], | ||
"License": "CC BY", | ||
"Size": ["10.8 million tokens"], | ||
"Annotation": ["sentence segmentation"], | ||
"Access": { | ||
"Concordancer": "https://spraakbanken.gu.se/korp/?corpus=sweacsam" | ||
"Download": "http://hdl.handle.net/10794/50" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "ACL Anthology Reference Corpus", | ||
"URL": "https://hdl.handle.net/10.35111/rfeg-z495", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains research papers in computational linguistics published between 1979 and 2015. The corpus data are in the XML format.#SEPThe corpus is available for online querying through the Sketch Engine (log-in required) and for download from a dedicated website.", | ||
"Languages": ["eng"], | ||
"License": "CC BY SA", | ||
"Size": ["75 million tokens"], | ||
"Annotation": ["PoS-tagged", "lemmatised", "author/text metadata"], | ||
"Access": { | ||
"Concordancer": "https://www.sketchengine.eu/acl-anthology-reference-corpus-arc/", | ||
"Download": "https://doi.org/10.35111/rfeg-z495" | ||
}, | ||
"Publication":"https://www.zotero.org/groups/562080/items/RL6RA4ZE" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Academic Corpus", | ||
"URL": "https://www.wgtn.ac.nz/lals/resources/academicwordlist/information/corpus", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains journal articles, book chapters, course workbooks, laboratory manuals, and course notes from the following disciplines: arts, commerce, law, and biology.\nThis corpus is not available.", | ||
"Languages": ["eng"], | ||
"License": "", | ||
"Size": ["3.5 million words"], | ||
"Annotation": [], | ||
"Access": { | ||
"Download": "" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Chambers-Le Baron Corpus of Research Articles", | ||
"URL": "http://hdl.handle.net/20.500.14106/2527", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains research papers in the following disciplines:\n<ul><li>media/culture,</li><li>literature,</li><li>linguistics and language learning,</li><li>social anthropology,</li><li>law, economics,</li><li>sociology and social sciences,</li><li>philosophy,</li><li>history, and</li><li>communication.</li></ul>\nThe research papers were published between 1998 and 2006. This is a plain text corpus.\nThe corpus is available for download from the Oxford Text Archive.", | ||
"Languages": ["fra"], | ||
"License": "Oxford Text Archive licence (academic use)", | ||
"Size": ["1 million words"], | ||
"Annotation": ["No annotation"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.14106/2527" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Czech Sociological Review", | ||
"URL": "https://hdl.handle.net/11372/LRT-2703", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains research papers in sociology published between 1993 and 2016. The corpus data are in the TSV format.#SEPThe corpus is available for download from the LINDAT repository.", | ||
"Languages": ["ces"], | ||
"License": "MIT", | ||
"Size": ["3 million words"], | ||
"Annotation": [], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11372/LRT-2703" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "English Scientific Text Corpus", | ||
"URL": "http://hdl.handle.net/11858/00-246C-0000-0023-8CF9-6", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains journal articles in the following disciplines:\n<ul><li>computer science,</li><li>computational linguistics,</li><li>informatics,</li><li>digital construction,</li><li>microelectronics,</li><li>linguistics,</li><li>biology,</li><li>mechanical engineering, and</li><li>electrical engineering.</li></ul>\nThe articles were published in the 1970s, 1980s and the 200s.\nThe corpus is available for online querying through CQPWeb (CLARIN-D distribution).", | ||
"Languages": ["eng"], | ||
"License": "restricted", | ||
"Size": ["35 million tokens"], | ||
"Annotation": ["PoS-tagged", "lemmatised", "author/text metadata", "document structure"], | ||
"Access": { | ||
"Concordancer": "https://hdl.handle.net/11858/00-246C-0000-0023-8CF9-6" | ||
}, | ||
"Publication":"https://www.zotero.org/groups/562080/items/CRFN3M3V" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus of Estonian scientific texts", | ||
"URL": "http://hdl.handle.net/11297/1-00-0000-0000-0000-0002-4", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains scientific articles and PhD theses. The corpus data are in the P5 format.", | ||
"Languages": ["est"], | ||
"License": "CLARIN ACA-NC", | ||
"Size": ["5 million words"], | ||
"Annotation": [], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11297/1-00-0000-0000-0000-0002-4" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "GENIA corpus", | ||
"URL": "http://www.geniaproject.org/genia-corpus", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains journal paper abstracts in biomedicine. The corpus data are in various formats, e.g., PTB.\nThe corpus is available for download from PORTULAN.", | ||
"Languages": ["eng"], | ||
"License": "free but unspecified", | ||
"Size": ["437,000 words"], | ||
"Annotation": ["PoS-tagged", "syntactically parsed", "annotated for terms, events, semantic relations and coreference", "text metadata"], | ||
"Access": { | ||
"Download": "http://www.geniaproject.org/genia-corpus" | ||
}, | ||
"Publication":"Su et al. 2008" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Corpus of Slovene linguistic scientific writing JezKor", | ||
"URL": "http://hdl.handle.net/11356/1755", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains a collection of linguistic scientific writing in the Slovenian language. It consists of 43 monographs published between 2009 and 2022 by Fran Ramovš institute of Slovenian language and Založba ZRC, 267 papers published in the journal \"Jezikoslovni zapiski\" and 28 papers published in the journal \"Slovenski jezik\". Note that the texts were obtained directly from PDFs, so they contain various types of noise.\nThe corpus is linguistically annotated with the CLASSLA pipeline (https://github.com/clarinsi/classla) on the levels lemmatisation, MULTEXT-East Version 6 morphosyntactic descriptions, Universal Dependencies part-of-spech and morphological features, and named entities. It is distributed in CoNLL-U and vertical file format, one file for each text. Text metadata consists of the author(s), title and year of publication.\nThe corpus is available for download from the CLARIN.SI repository as well as for online browsing through the noSketch Engine and KonText concordancers.", | ||
"Languages": ["slv"], | ||
"License": "CC BY", | ||
"Size": ["9.3 million tokens"], | ||
"Annotation": ["PoS-tagged (UD)", "MSD-tagged (UD & MULTEXT-East)", "lemmatised", "annotated for named entities and author/text metadata"], | ||
"Access": { | ||
"Concordancer (noSketchEngine)": "https://www.clarin.si/ske/#dashboard?corpname=jezkor" | ||
"Concordancer (KonText)": "https://www.clarin.si/kontext/query?corpname=jezkor" | ||
"Download": "http://hdl.handle.net/11356/1755" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus of Academic Slovene KAS 2.0", | ||
"URL": "http://hdl.handle.net/11356/1448", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains BA, MA, and PhD theses in humanities, social sciences, and natural sciences published between 2000 and 2018. The corpus data are in the TEI format.\nThe corpus is available for download from CLARIN.SI. Version 1.0 is also available for online querying through <a href=\"https://www.clarin.si/noske/run.cgi/corp_info?corpname=kas&struct_attr_stats=1&subcorpora=1\">noSketch Engine</a> and <a href=\"https://www.clarin.si/kontext/first_form?corpname=kas\">KonText</a> (CLARIN.SI distribution).", | ||
"Languages": ["slv"], | ||
"License": "CLARIN.SI Licence ACA ID-BY-NC-INF-NORED 1.0", | ||
"Size": ["1.5 billion tokens"], | ||
"Annotation": ["MSD-tagged", "lemmatised", "marked for bilingual and monolingual term candidates"], | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1448" | ||
}, | ||
"Publication": "Erjavec et al. 2020" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "The KIAP corpus", | ||
"URL": "http://hdl.handle.net/11495/D989-605B-8F10-5", | ||
"Family": "Academic corpora", | ||
"Description": "This comparable corpus contains research articles in economics, linguistics, and medicine published between 1992 and 2003.\nThe corpus is available for online browsing through the concordancer Corpuscle (CLARINO distribution).", | ||
"Languages": ["eng","fra","nor"], | ||
"License": "CC-BY 4.0", | ||
"Size": ["3.9 million tokens"], | ||
"Annotation": ["PoS-tagged"], | ||
"Access": { | ||
"Concordancer": "http://clarino.uib.no/korpuskel/landing-page?identifier=kiap&view=short" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "The Language of Literature and the Language of Translation (collected scientific papers)", | ||
"URL": "http://hdl.grnet.gr/11500/KEG-0000-0000-24F2-6", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains journal articles in literary and translation studies. This is a plain text corpus.\nThe corpus is available for download from the CLARIN:EL repository.", | ||
"Languages": ["ell"], | ||
"License": "CC-BY-SA", | ||
"Size": ["48,300 words"], | ||
"Annotation": [], | ||
"Access": { | ||
"Download": "http://hdl.grnet.gr/11500/KEG-0000-0000-24F2-6" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Modern Greek Dialects: scientific papers", | ||
"URL": "http://hdl.grnet.gr/11500/KEG-0000-0000-2502-4", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains scientific texts in linguistics and dialectology. This is a plain text corpus.\nThe corpus is available for download from the CLARIN:EL repository.", | ||
"Languages": ["ell"], | ||
"License": "CC-BY-SA", | ||
"Size": ["113,000 words"], | ||
"Annotation": [], | ||
"Access": { | ||
"Download": "http://hdl.grnet.gr/11500/KEG-0000-0000-2502-4" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "MuchMore Springer Bilingual Corpus", | ||
"URL": "http://muchmore.dfki.de/resources1.htm", | ||
"Family": "Academic corpora", | ||
"Description": "This paper contains journal paper abstracts from medical disciplines. The corpus is encoded in MuchMore XML.\nThe corpus is available for download from a dedicated website.", | ||
"Languages": ["eng","deu"], | ||
"License": "free but unspecified", | ||
"Size": ["1 million tokens"], | ||
"Annotation": ["PoS/MSD-tagged", "phrase chunking", "semantic class and relations", "document structure"], | ||
"Access": { | ||
"Download": "http://muchmore.dfki.de/resources1.htm" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Corpus of scientific texts from the Open Science Slovenia portal OSS 1.0", | ||
"URL": "http://hdl.handle.net/11356/1774", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains a large collection of scientific writing in the Slovenian language gathered from the <a href="https://openscience.si">Open Science Slovenia portal</a>. It consists of over 150 thousand monographs, articles, diploma, master's and doctoral theses, advanced textbooks, reviews etc. mostly published between 2000 and 2022 by Slovenian universities, research institutions, etc. Texts are accompanied by metadata, i.e. author, supervisor (for theses), year of publication, publisher (mostly faculties of the various universities), type of publication (according to SICRIS classification), keywords, and CERIF and UDC codes. The texts were obtained directly from PDFs, so it should be noted that they can contain various types of character noise. The texts are linguistically annotated with the <a href=\"https://github.com/clarinsi/classla\">CLASSLA pipeline</a> on the levels lemmatisation, MULTEXT-East Version 6 morphosyntactic descriptions, Universal Dependencies part-of-spech and morphological features, and named entities. The corpus is distributed in CoNLL-U and vertical file formats, one file for each text. The text metadata is given as a TSV file.\nNote that there exist similar, but older and smaller corpora <a href=\"http://hdl.handle.net/11356/1448\">KAS 2.0</a> and <a href=\"http://hdl.handle.net/11356/1244\">KAS 1.0</a>. These contain only theses and only up to 2018, but are cleaner and with more metadata. The repository also archives a number of KAS-derived datasets; pls. search for "KAS" to find them.\nThe corpus is available for download from the CLARIN.SI repository as well as for online browsing through the noSketch Engine and KonText concordancers.", | ||
"Languages": ["slv"], | ||
"License": "CC BY-SA", | ||
"Size": ["326 million tokens"], | ||
"Annotation": ["PoS-tagged (UD)", "MSD-tagged (UD & MULTEXT-East)", "lemmatised", "annotated for named entities and author/text metadata"], | ||
"Access": { | ||
"Concordancer (noSketchEngine)": "https://www.clarin.si/ske/#dashboard?corpname=oss10" | ||
"Concordancer (KonText)": "https://www.clarin.si/kontext/query?corpname=oss10" | ||
"Download": "http://hdl.handle.net/11356/1774" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "OROSSIMO Corpus", | ||
"URL": "http://hdl.grnet.gr/11500/ATHENA-0000-0000-2410-5", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains academic texts in the following disciplines:\n<ul<li>social sciences,</li><li>computer science,</li><li>economics,</li><li>linguistics,</li><li>photography,</li><li>law,</li><li>engineering,</li><li>history,</li><li>astronomy,</li><li>earth sciences and geology,</li><li>medicine and health, and</li><li>biology.</li></ul>\nThe corpus is encoded in XML (XCES).\nThe corpus is available for download from the CLARIN:EL repository.", | ||
"Languages": ["ell"], | ||
"License": "CC-BY", | ||
"Size": ["2.5 million tokens"], | ||
"Annotation": ["marked for term candidates", "mixed structural annotation"], | ||
"Access": { | ||
"Download": "http://hdl.grnet.gr/11500/ATHENA-0000-0000-2410-5" | ||
}, | ||
"Publication": "Mantzari et al. 1999" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Reading Academic Text corpus", | ||
"URL": "http://www.reading.ac.uk/internal/appling/corpus.htm", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains PhD theses from the following disciplines: agriculture, psychology, food science, technology, meteorology, and history. The data are encoded in ASCII and HTML.\nThe corpus is not available because it is restricted at present to staff and researchers at the University of Reading, and it is only available 'on-site'. However, it is possible for people outside the University to make use of the corpus on a Research Attachment arrangement.", | ||
"Languages": ["eng"], | ||
"License": "restricted", | ||
"Size": [], | ||
"Annotation": [], | ||
"Access": { | ||
"Download": "" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"Name": "Corpus of Romanian Academic Genres – ROGER (bilingual, student papers)", | ||
"URL": "https://roger-corpus.org/", | ||
"Family": "Academic corpora", | ||
"Description": "The corpus contains academic papers from eight disciplines, written by the Romanian students in native Romanian and English L2.\nThe corpus was collected over a three-year period (2018–2021) with the help of 27 collaborators from nine Romanian universities.\nThe corpus is available for online querying through a <a href=\"https://roger-corpus.org/index.php\">dedicated platform</a> developed at the <a href=\"https://codhus.projects.uvt.ro/\">CODHUS</a> research centre from the West University of Timisoara.", | ||
"Languages": ["eng","ron"], | ||
"License": "CC BY-NC-ND", | ||
"Size": ["3.3 million words"], | ||
"Annotation": [], | ||
"Access": { | ||
"Concordancer": "https://roger-corpus.org/login.php" | ||
}, | ||
"Publication" :"Striletchi et al. (2022)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "The Royal Society Corpus", | ||
"URL": "http://hdl.handle.net/21.11119/0000-0001-7E8B-6", | ||
"Family": "Academic corpora", | ||
"Description": "This corpus contains journal articles published in <a href=\"http://rstl.royalsocietypublishing.org/\">Philosophical Transactions of the Royal Society of London</a> between 1665 and 1869.\nThe corpus is available for online querying through CQPweb and for download from the CLARIN-D repository of the University of Saarland.", | ||
"Languages": ["English (late and early modern)"], | ||
"License": "CC BY", | ||
"Size": ["32 million tokens"], | ||
"Annotation": ["PoS-tagged", "lemmatised", "normalised", "author and document metadata"], | ||
"Access": { | ||
"Concordancer": "http://fedora.clarin-d.uni-saarland.de/rsc_v4/access.html#cqpweb" | ||
"Download": "http://fedora.clarin-d.uni-saarland.de/rsc_v4/access.html#download" | ||
}, | ||
"Publication": "https://www.zotero.org/groups/562080/items/FWYERQ4A" | ||
} |
Oops, something went wrong.