-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9014fb2
commit 66706bb
Showing
34 changed files
with
557 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Annotated Corpus of Czech Case Law for Reference Recognition Tasks", | ||
"URL": "http://hdl.handle.net/11234/1-3008", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus consists of 350 manually annotated decisions at Czech top-tier courts (Supreme Court, Supreme Administrative Court, Constitutional Court). Each decision has been manually annotated by two trained annotators; the corpus is primarily developed as training and testing materials for reference recognition tasks. See also the variant of this corpus annotated for <a href=\"https://hdl.handle.net/11372/LRT-2901\">segmentation tasks</a>.\nThe corpus is available for download from LINDAT.", | ||
"Languages": ["ces"], | ||
"License": "CC BY 4.0", | ||
"Size": [], | ||
"Annotation": ["legal references (identifier of court decision; author of law book or article, etc.)"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11234/1-3008" | ||
}, | ||
"Publication": "Harašta et al. (2018)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "CABank English SCOTUS Oral Arguments Corpus", | ||
"URL": "https://hdl.handle.net/10.21415/T5Z315", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus consists of transcripts and recordings of oral arguments at the Supreme Court of the United States.\nThe transcripts and audio recordings are aligned at the utterance level; the utterances are annotated based on speaker role (the primary one being Justice) and name, as well as gender.\nThe corpus is part of the CABank collection and available for download from and online browsing through TalkBank.", | ||
"Languages": ["eng"], | ||
"License": "CC BY-NC-SA 3.0", | ||
"Size": [], | ||
"Annotation": ["speaker segmentation", "sociolinguistic annotation"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Browse": "https://sla.talkbank.org/TBB/ca/SCOTUS/OralArguments", | ||
"Download": "https://ca.talkbank.org/data/SCOTUS/OralArguments.zip" | ||
}, | ||
"Publication": "Johnson and Goldman (2009)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "COVID-19 EUR-LEX dataset . Multilingual (CEF languages)", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-FE69-0", | ||
"Family": "Legal corpora", | ||
"Description": "This is a multilingual corpus of the <a href=\"https://eur-lex.europa.eu/homepage.html\">European Union Law</a> pertaining to COVID-19 period.\nThe corpus is available for download from the PORTULAN repository.", | ||
"Languages": ["mlt", "hun", "lit", "lav", "pol", "por", "eng", "slv", "ell", Spanish (Castilian), "ron", "slk", Moldavian, "swe", "bul", "ita", "deu", "hrv", "fra", Dutch (Flemish), "ces", "fin", "dan", Irish, "est"], | ||
"License": "CC BY", | ||
"Size": ["475,931 translation pairs"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Multilingual corpora", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-FE69-0" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "COVID-19 EUR-LEX dataset. Βilingual (EN-PT)", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-FE66-3", | ||
"Family": "Legal corpora", | ||
"Description": "This is a parallel corpus of the <a href=\"https://eur-lex.europa.eu/homepage.html\">European Union Law</a> pertaining to COVID-19 period.\nThe corpus is available for download from the PORTULAN repository.", | ||
"Languages": ["eng", "por"], | ||
"License": "CC BY", | ||
"Size": ["21,000 units"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Multilingual corpora", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-FE66-3 " | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Czech Court Decisions Corpus (CzCDC 1.0)", | ||
"URL": "https://hdl.handle.net/11372/LRT-3052", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus consists of around 237,000 court decisions from three top-tier courts (Supreme, Supreme Administrative, and Constitutional) in Czechia, published between 1993 and 2018.\nThe corpus is available for download from LINDAT.", | ||
"Languages": ["ces"], | ||
"License": "CC BY-NC 4.0", | ||
"Size": ["460 million words"], | ||
"Annotation": ["unannotated"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/11372/LRT-3052" | ||
}, | ||
"Publication": "Novotná and Harašta (2019)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"Name": "Czech Legal Text Treebank", | ||
"URL": "http://hdl.handle.net/11234/1-2498", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus consists of two legal documents: Accounting Act (563/1991 Coll., as amended) and Decree on Double-entry Accounting for undertakers (500/2002 Coll., as amended).\nThe corpus is available for download from LINDAT and online browsing through the treebank viewer PML-TQ and the concordancer KonText.", | ||
"Languages": ["ces"], | ||
"License": "CC BY-NC-SA 4.0", | ||
"Size": ["1128 sentences"], | ||
"Annotation": ["manual syntactic annotation; manual annotation of entities from the accouting domain and relations definition, obligation, right"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/11234/1-2498", | ||
"PML-TQ": "http://lindat.mff.cuni.cz/services/pmltq/#!/treebank/cltt20/query/", | ||
"KonText": "http://lindat.mff.cuni.cz/services/kontext/first_form?corpname=legaltext_cs_a" | ||
}, | ||
"Publication": "Kríž and Hladka (2018)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The German Sub-corpus of MULCOLD, Multilingual Parallel Corpus of Legal Texts", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2016042606", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus, which is a subcorpus of <a href=\"http://urn.fi/urn:nbn:fi:lb-201405278\">MULCOLD</a> (see also the <a href=\"https://www.clarin.eu/resource-families/parallel-corpora\">Parallel corpora</a> resource family) contains international conventions and treaties.\nThe corpus is available for online browsing through the concordancer Korp (FIN-CLARIN Distribution).", | ||
"Languages": ["deu"], | ||
"License": "CC BY-ND", | ||
"Size": ["198,035 tokens"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Browse": "http://urn.fi/urn:nbn:fi:lb-2016042606" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The English Sub-corpus of MULCOLD, Multilingual Parallel Corpus of Legal Texts", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2016042605", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus, which is a subcorpus of <a href=\"http://urn.fi/urn:nbn:fi:lb-201405278\">MULCOLD</a> (see also the <a href=\"https://www.clarin.eu/resource-families/parallel-corpora\">Parallel corpora</a> resource family) contains international conventions and treaties.\nThe corpus is available for online browsing through the concordancer Korp (FIN-CLARIN Distribution).", | ||
"Languages": ["eng"], | ||
"License": "CC BY-ND", | ||
"Size": ["359,874 tokens"], | ||
"Annotation": ["lemmatised", "MSD-tagged"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Browse": "http://urn.fi/urn:nbn:fi:lb-2016042605" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "English Acquis Communautaire", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000B-D50A-A", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus contains selected texts from the <a href=\"https://en.wikipedia.org/wiki/Acquis_communautaire\">Acquis Communautaire</a> between the 1950s and today, translated to English.\nThe corpus is available for download from PORTULAN.", | ||
"Languages": ["eng"], | ||
"License": "MIT (academic)", | ||
"Size": ["34.6 million tokens"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000B-D50A-A" | ||
}, | ||
"Publication": "Steinberger et al. (2006)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Corpus of Estonian law texts", | ||
"URL": "http://hdl.handle.net/11297/1-00-0000-0000-0000-0002-2", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus contains Estonian laws (1.8 million tokens) as well as European legislation (9.6 million tokens) translated into Estonian.\nThe corpus is available for download from a dedicated webpage hosted by CLARIN Estonia.", | ||
"Languages": ["est"], | ||
"License": "CLARIN PUB", | ||
"Size": ["11 million tokens"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "http://www.cl.ut.ee/korpused/segakorpus/seadused/" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The Finnish Sub-corpus of FiRuLex, Russian-Finnish Comparable Corpus of Legal Texts", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2016042604", | ||
"Family": "Legal corpora", | ||
"Description": "This is the Finnish subcorpus of <a href=\"http://urn.fi/urn:nbn:fi:lb-201407161\">FiRuLex</a>, which contains juridical texts in Russian and Finnish.\nThe corpus is available for online browsing through the concordancer Korp (FIN-CLARIN distribution)", | ||
"Languages": ["fin"], | ||
"License": "CC BY-ND", | ||
"Size": ["1.5 million tokens"], | ||
"Annotation": ["lemmatised", "MSD-tagged"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Browse": "http://urn.fi/urn:nbn:fi:lb-201407162" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"Name": "The Finnish Sub-corpus of the JRC-Acquis Multilingual Parallel Corpus, Downloadable Version", | ||
"URL": "http://urn.fi/urn:nbn:fi:lb-2016042710", | ||
"Family": "Legal corpora", | ||
"Description": "This is the legal subcorpus of the <a href=\"http://urn.fi/urn:nbn:fi:lb-2016042602\">Helsinki Korp Version of the Finnish TreeBank 3</a>.\nThe corpus is available for online browsing through the concordancer Korp (FIN-CLARIN distribution) and for download from the Finnish Language Bank.", | ||
"Languages": ["fin"], | ||
"License": "CC BY", | ||
"Size": ["44.1 million tokens"], | ||
"Annotation": ["syntactically parsed (constituency)", "sentence/phrase/word segmentation"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Browse": "http://urn.fi/urn:nbn:fi:lb-2016042709", | ||
"Download": "http://urn.fi/urn:nbn:fi:lb-2019102401" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "IGC-Laws-21.05 (The Icelandic Gigaword Corpus: Law, bills and proposals)", | ||
"URL": "http://hdl.handle.net/20.500.12537/116", | ||
"Family": "Legal corpora", | ||
"Description": "IGC-Laws is a subcorpus of the <a href=\"http://hdl.handle.net/20.500.12537/192\">The Icelandic Gigaword Corpus</a> (see also <a href=\"https://www.clarin.eu/resource-families/reference-corpora\">CLARIN reference corpora</a>). IGC-Laws contains 1) the Icelandic laws, 2) explanatory reports and observations extracted from bills submitted to Althingi, and 3) parliamentary proposals and resolutions. The corpus comes in two formats. One contains the texts untokenized and untagged while the other has been tokenized, PoS-tagged and lemmatized.\nThe corpus is available for download from the CLARIN-IS repository.", | ||
"Languages": ["isl"], | ||
"License": "CC BY 4.0", | ||
"Size": ["2,2 million sentences", "40,6 million words"], | ||
"Annotation": ["lemmatised", "MSD-tagged"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.12537/116" | ||
}, | ||
"Publication": "Steingrímsson et al. (2018)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "The JRC-Acquis Corpus, version 3.0", | ||
"URL": "http://hdl.handle.net/11500/ATHENA-0000-0000-25C9-4", | ||
"Family": "Legal corpora", | ||
"Description": "This is a parallel corpus of Acquis Communautaire, which is the total body of European Union law applicable in European member states.\nMost texts have been manually classified according to the EUROVOC subject domains so that the collection can also be used to train and test multi-label classification algorithms and keyword-assignment software. The corpus is encoded in XML, according to the Text Encoding Initiative Guidelines. Due to the large number of parallel texts in many languages, the JRC-Acquis is particularly suitable to carry out all types of cross-language research, as well as to test and benchmark text analysis software across different languages (for instance for alignment, sentence splitting and term extraction). The sentence-level alignment was done using the <a href=\"https://github.com/danielvarga/hunalign\">hunalign</a> tool.\nThe corpus is available for download from the CLARIN:EL repository.", | ||
"Languages": ["bul", "ces", "dan", "deu", "eng", "spa", "est", "fin", "fra", "hun", "ita", "lit", "lav", "mlt", "nld", "pol", "por", "ron", "slk", "slv", "swe"], | ||
"License": "CC BY 4.0", | ||
"Size": ["1 billion words"], | ||
"Annotation": ["paragraph and sentence alignment"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Multilingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11500/ATHENA-0000-0000-25C9-4" | ||
}, | ||
"Publication": "Steinberger et al. (2006)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"Name": "JRC EU DGT Translation Memory Parsebank DGT-UD", | ||
"URL": "http://hdl.handle.net/11356/1197", | ||
"Family": "Legal corpora", | ||
"Description": "", | ||
"Languages": ["bul", "hrv", "ces", "dan", "nld", "eng", "est", "fin", "fra", "deu", "hun", "gle", "ita", "lav", "lit", "ell", "pol", "por", "ron", "slk", "slv", "spa", "swe"], | ||
"License": "CC BY 4.0", | ||
"Size": ["2.1 billion tokens"], | ||
"Annotation": ["syntactically parsed (Universal Dependencies)"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Multilingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11356/1197", | ||
"KonText": "https://www.clarin.si/kontext/first_form?corpname=dgtud_en", | ||
"noSketch Engine": "https://www.clarin.si/noske/run.cgi/corp_info?corpname=dgtud_en" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Corpus of Judicial Rhetoric: cases of rapes and homicides", | ||
"URL": "http://hdl.handle.net/11500/CLARIN-EL-0000-0000-6114-C", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus consists of transcriptions of defendants’ and witnesses’ speeches in criminal cases of rape, attempted rape, murder, and attempted murder.\nThe corpus is available for download from the CLARIN:EL repository.", | ||
"Languages": ["ell"], | ||
"License": "CC BY-NC-ND 4.0", | ||
"Size": [], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11500/CLARIN-EL-0000-0000-6114-C" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Corpus Juridisch Nederlands", | ||
"URL": "http://hdl.handle.net/10032/tm-a2-u2", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus contains legal texts from 1814 to 1989, compiled year by year.\nThe corpus is available for online browsing on a dedicated webpage", | ||
"Languages": ["nld"], | ||
"License": "CLARIN PUB", | ||
"Size": ["5,856 texts"], | ||
"Annotation": ["lemmatised", "PoS-tagged"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Browse": "https://corpusjuridischnederlands.ivdnt.org/corpus-frontend/juridisch-corpus/search/" | ||
}, | ||
"Publication": "de Does et al. (2017)" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Legal texts from Estonian Ministry of Justice (Processed)", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-FAD1-D ", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus contains Estonian-English translations of the Acts of Estonian law.\nThe corpus is available for download from PORTULAN.", | ||
"Languages": ["Estonian-English"], | ||
"License": "CC BY", | ||
"Size": ["47,000 units"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Multilingual corpora", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-FAD1-D " | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Legal Documents from Norwegian Nynorsk Municipialities", | ||
"URL": "https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-60/", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus contains 50,000 legal documents and meeting minutes collected with the web crawler Veidemann. Around 88.5 million words are in Nynork, while the rest are in Bokmal (Bokmål).\nThe corpus is available for download from the Norwegian Language Bank.", | ||
"Languages": ["Norwegian (Nynorsk and Bokmål)"], | ||
"License": "CC0 1.0 Universal", | ||
"Size": ["127 million words"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "https://www.nb.no/sbfil/tekst/sakspapir_nno/sakspapir_nno_01.tar.gz" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Comprehension for Readability Studies", | ||
"URL": "http://hdl.handle.net/11234/1-5020", | ||
"Family": "Legal corpora", | ||
"Description": "This is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the Hamburg Comprehensibility Concept.\nThe corpus is comprised of 18 documents in total; that is, six different texts from the legal/administration domain, each in three versions: the original and two paraphrases. Each such document triple shares one reading-comprehension test administered to at least thirty readers of random gender, educational background, and age. The data set also captures basic demographic information about each reader, their familiarity with the topic, and their subjective assessment of the stylistic properties of the given document, roughly corresponding to the key text properties identified by the Hamburg Comprehensibility Concept.\nThe corpus is available for download from LINDAT.", | ||
"Languages": ["ces"], | ||
"License": "CC BY 4.0", | ||
"Size": ["17601 tokens"], | ||
"Annotation": ["textual annotation"], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/11234/1-5020" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Corpus of Legal Acts of the Republic of Latvia (Likumi)", | ||
"URL": "http://hdl.handle.net/20.500.12574/65", | ||
"Family": "Legal corpora", | ||
"Description": "The corpus contains all legal acts of the Republic of Latvia published on the website <a href=\"https://likumi.lv/\">likumi.lv</a> (until February 2022).\nThe corpus is available for download from the CLARIN.LV repository.", | ||
"Languages": ["lav"], | ||
"License": "CC BY 4.0", | ||
"Size": ["116 million tokens", "73 million words"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.12574/65" | ||
}, | ||
"Publication": "" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"Name": "Lithuanian Corpus of the EU Primary and Secondary Law Acts of the Period 2015–2017", | ||
"URL": "http://hdl.handle.net/20.500.11821/18", | ||
"Family": "Legal corpora", | ||
"Description": "This corpus contains primary and secondary European law acts (32 texts) translated into Lithuanian.\nThe corpus is available for download from CLARIN-LT.", | ||
"Languages": ["lit"], | ||
"License": "CLARIN PUB", | ||
"Size": ["274,460 words"], | ||
"Annotation": [], | ||
"Infrastructure": "CLARIN", | ||
"Group": "Monolingual corpora", | ||
"Access": { | ||
"Download": "http://hdl.handle.net/20.500.11821/18" | ||
}, | ||
"Publication": "" | ||
} |
Oops, something went wrong.