diff --git a/Historical corpora/Historical corpora.html b/Historical corpora/Historical corpora.html index 91d9b83..1a94344 100644 --- a/Historical corpora/Historical corpora.html +++ b/Historical corpora/Historical corpora.html @@ -49,28 +49,26 @@

Monolingual corpora

- The Diorisis Ancient Greek Corpus + Greek Medieval Texts

- Size: 10.2 million words -
- Annotation: PoS-tagged, lemmatised + Size: 3.4 million words
- Licence: CC BY 4.0 + Licence: CC-BY

Ancient Greek -

This corpus consists of 820 texts spanning between the beginnings of the Ancient Greek literary tradition (Homer) to the fifth century AD.

-

The texts are sourced from the Perseus Canonical Greek Lit Repository, "The Little Sailing" digital library, and the Bibliotheca Augustana digital library.

-

The corpus is available for download from Figshare.

+

This corpus contains texts from the 4th to the 16th century.

+

The texts belong to the following categories: religious, poetical-literary, political, and historical texts, as well as hymns and epigrams.

+

The corpus is available for download from the clarin:el repository.

-

For the relevant publication, see Vatri and McGillivray (2018)

+ -

Download

+

Download

@@ -80,26 +78,28 @@

Monolingual corpora

- Greek Medieval Texts + The Diorisis Ancient Greek Corpus

- Size: 3.4 million words + Size: 10.2 million words
- Licence: CC-BY + Annotation: PoS-tagged, lemmatised +
+ Licence: CC BY 4.0

Ancient Greek -

This corpus contains texts from the 4th to the 16th century.

-

The texts belong to the following categories: religious, poetical-literary, political, and historical texts, as well as hymns and epigrams.

-

The corpus is available for download from the clarin:el repository.

+

This corpus consists of 820 texts spanning between the beginnings of the Ancient Greek literary tradition (Homer) to the fifth century AD.

+

The texts are sourced from the Perseus Canonical Greek Lit Repository, "The Little Sailing" digital library, and the Bibliotheca Augustana digital library.

+

The corpus is available for download from Figshare.

- +

For the relevant publication, see Vatri and McGillivray (2018)

-

Download

+

Download

@@ -1067,12 +1067,12 @@

Monolingual corpora

- The Nottingham Corpus of Early Modern German Midwifery and Women's Medicine (ca. 1500-1700) + GerManC. A Historical Corpus of German Newspapers 1650-1800

- Size: 120,000 tokens + Size: 700,000 words
- Annotation: TEI Lite markup, no linguistic annotation + Annotation: no annotation
Licence: CC-BY-NC-SA 3.0

@@ -1081,14 +1081,13 @@

Monolingual corpora

German -

This corpus contains medical writing from 1500 to 1700.

-

The texts are taken primarily from digital facsimile copies available online via the University of Würzburg’s library interface, particularly from the subcategory of pertaining to gynaecology.

+

This corpus contains personal letters, sermons and fictional, scholarly (i.e., humanities), scientific and legal texts from 1650 to 1800.

The corpus is available for download from the Oxford Text Archive.

-

Download

+

Download

@@ -1098,27 +1097,54 @@

Monolingual corpora

- GerManC. A Historical Corpus of German Newspapers 1650-1800 + Mannheimer Korpus Historischer Zeitungen und Zeitschriften

- Size: 700,000 words + Size: 3532 pages +

+ + + German + + +

This corpus contains texts from the 18th and 19th centuries.

+

The corpus is available for download directly through the VLO.

+ + + + +

Download

+ + + + + + + + +

+ Referenzkorpus Mittelhochdeutsch (Middle High German Reference Corpus) +

+

+ Size: 2.5 million tokens
- Annotation: no annotation + Annotation: tokenised, PoS-tagged, lemmatised, normalised, morphosyntactic description
- Licence: CC-BY-NC-SA 3.0 + Licence: CC-BY-SA 4.0

German -

This corpus contains personal letters, sermons and fictional, scholarly (i.e., humanities), scientific and legal texts from 1650 to 1800.

-

The corpus is available for download from the Oxford Text Archive.

+

This corpus contains texts from 1050 to 1350.

+

The corpus is available for download from the Deutsches Text Archiv and through a concordancer.

- +

For the relevant publication, see Klein and Dipper (2016).

-

Download

+

Concordancer

+

Download

@@ -1128,23 +1154,27 @@

Monolingual corpora

- Mannheimer Korpus Historischer Zeitungen und Zeitschriften + SaCoCo—Saarbrücken Cookbook Corpus

- Size: 3532 pages + Size: 436,000 tokens +
+ Annotation: PoS-tagged using the STTS tagset, lemmatised, normalised +
+ Licence: CC-BY-NC-SA-3.0

German -

This corpus contains texts from the 18th and 19th centuries.

-

The corpus is available for download directly through the VLO.

+

This corpus contains historical cookbook recipes from  1569 to 1800, as well as contemporary ones from 2012.

+

The corpus is available through the CQPweb concordancer provided by CLARIN-D.

-

Download

+

Concordancer

@@ -1154,28 +1184,28 @@

Monolingual corpora

- Referenzkorpus Mittelhochdeutsch (Middle High German Reference Corpus) + The Nottingham Corpus of Early Modern German Midwifery and Women's Medicine (ca. 1500-1700)

- Size: 2.5 million tokens + Size: 120,000 tokens
- Annotation: tokenised, PoS-tagged, lemmatised, normalised, morphosyntactic description + Annotation: TEI Lite markup, no linguistic annotation
- Licence: CC-BY-SA 4.0 + Licence: CC-BY-NC-SA 3.0

German -

This corpus contains texts from 1050 to 1350.

-

The corpus is available for download from the Deutsches Text Archiv and through a concordancer.

+

This corpus contains medical writing from 1500 to 1700.

+

The texts are taken primarily from digital facsimile copies available online via the University of Würzburg’s library interface, particularly from the subcategory of pertaining to gynaecology.

+

The corpus is available for download from the Oxford Text Archive.

-

For the relevant publication, see Klein and Dipper (2016).

+ -

Concordancer

-

Download

+

Download

@@ -1271,36 +1301,6 @@

Monolingual corpora

Download

- - - - - - -

- SaCoCo—Saarbrücken Cookbook Corpus -

-

- Size: 436,000 tokens -
- Annotation: PoS-tagged using the STTS tagset, lemmatised, normalised -
- Licence: CC-BY-NC-SA-3.0 -

- - - German - - -

This corpus contains historical cookbook recipes from  1569 to 1800, as well as contemporary ones from 2012.

-

The corpus is available through the CQPweb concordancer provided by CLARIN-D.

- - - - -

Concordancer

- - @@ -1538,6 +1538,61 @@

Monolingual corpora

Download

+ + + + + + +

+ CIPM +

+

+ Size: 3.5 million words +
+ Licence: CC-BY-NC-ND +

+ + + Portuguese + + +

This is a corpus of historical, religious, notarial, literary texts in prose and verse.

+

The corpus is available from PORTULAN.

+ + + + +

Browse

+

Download

+ + + + + + + + +

+ Portuguese Parish Memories (1758) +

+

+ Licence: CC BY +

+ + + Portuguese + + +

This is a corpus of historical surveys from the 18th century.

+

The corpus is available from PORTULAN.

+ + + + +

Download

+ +