Skip to content

Commit

Permalink
Deploying to gh-pages from @ 7c2a1d6 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
jakoble committed Oct 30, 2024
1 parent 377d729 commit 9fbbd42
Showing 1 changed file with 131 additions and 76 deletions.
207 changes: 131 additions & 76 deletions Historical corpora/Historical corpora.html
Original file line number Diff line number Diff line change
Expand Up @@ -49,28 +49,26 @@ <h3 id"table-title">Monolingual corpora</h3>
<tr>
<td valign="top">
<p>
<a href="http://hdl.handle.net/11372/LRT-4769">The Diorisis Ancient Greek Corpus</a>
<a href="http://hdl.grnet.gr/11500/AEGEAN-0000-0000-251D-7">Greek Medieval Texts</a>
</p>
<p>
<strong>Size: </strong>10.2 million words
<br>
<strong>Annotation: </strong>PoS-tagged, lemmatised
<strong>Size: </strong>3.4 million words
<br>
<strong>Licence: </strong>CC BY 4.0
<strong>Licence: </strong>CC-BY
</p>
</td>
<td valign="top">
Ancient Greek
</td>
<td valign="top">
<p>This corpus consists of 820 texts spanning between the beginnings of the Ancient Greek literary tradition (Homer) to the fifth century AD.</p>
<p>The texts are sourced from the <a href="https://github.com/PerseusDL/canonical-greekLit">Perseus Canonical Greek Lit Repository</a>, <a href="http://www.mikrosapoplous.gr/en/texts1en.html">"The Little Sailing" digital library</a>, and the <a href="http://www.hs-augsburg.de/~harsch/augustana.html#gr">Bibliotheca Augustana digital library</a>.</p>
<p>The corpus is available for download from Figshare.</p>
<p>This corpus contains texts from the 4th to the 16th century.</p>
<p>The texts belong to the following categories: religious, poetical-literary, political, and historical texts, as well as hymns and epigrams.</p>
<p>The corpus is available for download from the clarin:el repository. </p>

<p>For the relevant publication, see <a href="https://www.clarin.eu/resource-families/historical-corpora#vatrimcgillivray2018">Vatri and McGillivray (2018)</a></p>

</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="https://hdl.handle.net/10.6084/m9.figshare.6187256.v1"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>
<p><a class="btn btn-primary text-nowrap" href="http://hdl.grnet.gr/11500/AEGEAN-0000-0000-251D-7"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
Expand All @@ -80,26 +78,28 @@ <h3 id"table-title">Monolingual corpora</h3>
<tr>
<td valign="top">
<p>
<a href="http://hdl.grnet.gr/11500/AEGEAN-0000-0000-251D-7">Greek Medieval Texts</a>
<a href="http://hdl.handle.net/11372/LRT-4769">The Diorisis Ancient Greek Corpus</a>
</p>
<p>
<strong>Size: </strong>3.4 million words
<strong>Size: </strong>10.2 million words
<br>
<strong>Licence: </strong>CC-BY
<strong>Annotation: </strong>PoS-tagged, lemmatised
<br>
<strong>Licence: </strong>CC BY 4.0
</p>
</td>
<td valign="top">
Ancient Greek
</td>
<td valign="top">
<p>This corpus contains texts from the 4th to the 16th century.</p>
<p>The texts belong to the following categories: religious, poetical-literary, political, and historical texts, as well as hymns and epigrams.</p>
<p>The corpus is available for download from the clarin:el repository. </p>
<p>This corpus consists of 820 texts spanning between the beginnings of the Ancient Greek literary tradition (Homer) to the fifth century AD.</p>
<p>The texts are sourced from the <a href="https://github.com/PerseusDL/canonical-greekLit">Perseus Canonical Greek Lit Repository</a>, <a href="http://www.mikrosapoplous.gr/en/texts1en.html">"The Little Sailing" digital library</a>, and the <a href="http://www.hs-augsburg.de/~harsch/augustana.html#gr">Bibliotheca Augustana digital library</a>.</p>
<p>The corpus is available for download from Figshare.</p>


<p>For the relevant publication, see <a href="https://www.clarin.eu/resource-families/historical-corpora#vatrimcgillivray2018">Vatri and McGillivray (2018)</a></p>
</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="http://hdl.grnet.gr/11500/AEGEAN-0000-0000-251D-7"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>
<p><a class="btn btn-primary text-nowrap" href="https://hdl.handle.net/10.6084/m9.figshare.6187256.v1"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
Expand Down Expand Up @@ -1067,12 +1067,12 @@ <h3 id"table-title">Monolingual corpora</h3>
<tr>
<td valign="top">
<p>
<a href="http://hdl.handle.net/20.500.14106/2562">The Nottingham Corpus of Early Modern German Midwifery and Women's Medicine (ca. 1500-1700)</a>
<a href="http://hdl.handle.net/20.500.14106/2544">GerManC. A Historical Corpus of German Newspapers 1650-1800</a>
</p>
<p>
<strong>Size: </strong>120,000 tokens
<strong>Size: </strong>700,000 words
<br>
<strong>Annotation: </strong>TEI Lite markup, no linguistic annotation
<strong>Annotation: </strong>no annotation
<br>
<strong>Licence: </strong>CC-BY-NC-SA 3.0
</p>
Expand All @@ -1081,14 +1081,13 @@ <h3 id"table-title">Monolingual corpora</h3>
German
</td>
<td valign="top">
<p>This corpus contains medical writing from 1500 to 1700.</p>
<p>The texts are taken primarily from digital facsimile copies available online via the University of Würzburg’s <a href=" http://kallimachos.de/fachtexte/index.php/Hauptseite">library interface</a>, particularly from the subcategory of pertaining to gynaecology. </p>
<p>This corpus contains personal letters, sermons and fictional, scholarly (i.e., humanities), scientific and legal texts from 1650 to 1800.</p>
<p>The corpus is available for download from the Oxford Text Archive.</p>


</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/20.500.14106/2562"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/20.500.14106/2544"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
Expand All @@ -1098,27 +1097,54 @@ <h3 id"table-title">Monolingual corpora</h3>
<tr>
<td valign="top">
<p>
<a href="http://hdl.handle.net/20.500.14106/2544">GerManC. A Historical Corpus of German Newspapers 1650-1800</a>
<a href="http://hdl.handle.net/10932/00-01B8-AE41-41A4-DC01-5">Mannheimer Korpus Historischer Zeitungen und Zeitschriften</a>
</p>
<p>
<strong>Size: </strong>700,000 words
<strong>Size: </strong>3532 pages
</p>
</td>
<td valign="top">
German
</td>
<td valign="top">
<p>This corpus contains texts from the 18th and 19th centuries.</p>
<p>The corpus is available for download directly through the <a class="lexicon-term" href="https://www.clarin.eu/glossary#VLO" title="Virtual Language Observatory See: http://www.clarin.eu/vlo">VLO</a>.</p>


</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/10932/00-017B-E47E-5630-9F01-3"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
</tr>
</tbody>
<tbody>
<tr>
<td valign="top">
<p>
<a href="http://deutschestextarchiv.de/rem/">Referenzkorpus Mittelhochdeutsch (Middle High German Reference Corpus)</a>
</p>
<p>
<strong>Size: </strong>2.5 million tokens
<br>
<strong>Annotation: </strong>no annotation
<strong>Annotation: </strong>tokenised, PoS-tagged, lemmatised, normalised, morphosyntactic description
<br>
<strong>Licence: </strong>CC-BY-NC-SA 3.0
<strong>Licence: </strong>CC-BY-SA 4.0
</p>
</td>
<td valign="top">
German
</td>
<td valign="top">
<p>This corpus contains personal letters, sermons and fictional, scholarly (i.e., humanities), scientific and legal texts from 1650 to 1800.</p>
<p>The corpus is available for download from the Oxford Text Archive.</p>
<p>This corpus contains texts from 1050 to 1350.</p>
<p>The corpus is available for download from the Deutsches Text Archiv and through a concordancer.</p>


<p>For the relevant publication, see <a href="https://www.clarin.eu/resource-families/historical-corpora#Klein%20and%20Dipper%202016">Klein and Dipper (2016).</a></p>
</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/20.500.14106/2544"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>
<p><a class="btn btn-primary text-nowrap" href="http://www.deutschestextarchiv.de/"><span class="fa fa-search"></span>Concordancer</a></p>
<p><a class="btn btn-primary text-nowrap" href="http://deutschestextarchiv.de/rem/"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
Expand All @@ -1128,23 +1154,27 @@ <h3 id"table-title">Monolingual corpora</h3>
<tr>
<td valign="top">
<p>
<a href="http://hdl.handle.net/10932/00-01B8-AE41-41A4-DC01-5">Mannheimer Korpus Historischer Zeitungen und Zeitschriften</a>
<a href="http://hdl.handle.net/11858/00-246C-0000-001F-7C43-1">SaCoCo—Saarbrücken Cookbook Corpus</a>
</p>
<p>
<strong>Size: </strong>3532 pages
<strong>Size: </strong>436,000 tokens
<br>
<strong>Annotation: </strong>PoS-tagged using the STTS tagset, lemmatised, normalised
<br>
<strong>Licence: </strong>CC-BY-NC-SA-3.0
</p>
</td>
<td valign="top">
German
</td>
<td valign="top">
<p>This corpus contains texts from the 18th and 19th centuries.</p>
<p>The corpus is available for download directly through the <a class="lexicon-term" href="https://www.clarin.eu/glossary#VLO" title="Virtual Language Observatory See: http://www.clarin.eu/vlo">VLO</a>.</p>
<p>This corpus contains historical cookbook recipes from  1569 to 1800, as well as contemporary ones from 2012.</p>
<p>The corpus is available through the CQPweb concordancer provided by CLARIN-D.</p>


</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/10932/00-017B-E47E-5630-9F01-3"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>
<p><a class="btn btn-primary text-nowrap" href="https://corpora.clarin-d.uni-saarland.de/cqpweb/sacoco"><span class="fa fa-search"></span>Concordancer</a></p>


</td>
Expand All @@ -1154,28 +1184,28 @@ <h3 id"table-title">Monolingual corpora</h3>
<tr>
<td valign="top">
<p>
<a href="http://deutschestextarchiv.de/rem/">Referenzkorpus Mittelhochdeutsch (Middle High German Reference Corpus)</a>
<a href="http://hdl.handle.net/20.500.14106/2562">The Nottingham Corpus of Early Modern German Midwifery and Women's Medicine (ca. 1500-1700)</a>
</p>
<p>
<strong>Size: </strong>2.5 million tokens
<strong>Size: </strong>120,000 tokens
<br>
<strong>Annotation: </strong>tokenised, PoS-tagged, lemmatised, normalised, morphosyntactic description
<strong>Annotation: </strong>TEI Lite markup, no linguistic annotation
<br>
<strong>Licence: </strong>CC-BY-SA 4.0
<strong>Licence: </strong>CC-BY-NC-SA 3.0
</p>
</td>
<td valign="top">
German
</td>
<td valign="top">
<p>This corpus contains texts from 1050 to 1350.</p>
<p>The corpus is available for download from the Deutsches Text Archiv and through a concordancer.</p>
<p>This corpus contains medical writing from 1500 to 1700.</p>
<p>The texts are taken primarily from digital facsimile copies available online via the University of Würzburg’s <a href=" http://kallimachos.de/fachtexte/index.php/Hauptseite">library interface</a>, particularly from the subcategory of pertaining to gynaecology. </p>
<p>The corpus is available for download from the Oxford Text Archive.</p>

<p>For the relevant publication, see <a href="https://www.clarin.eu/resource-families/historical-corpora#Klein%20and%20Dipper%202016">Klein and Dipper (2016).</a></p>

</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="http://www.deutschestextarchiv.de/"><span class="fa fa-search"></span>Concordancer</a></p>
<p><a class="btn btn-primary text-nowrap" href="http://deutschestextarchiv.de/rem/"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/20.500.14106/2562"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
Expand Down Expand Up @@ -1271,36 +1301,6 @@ <h3 id"table-title">Monolingual corpora</h3>
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/11022/0000-0007-C64C-5"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
</tr>
</tbody>
<tbody>
<tr>
<td valign="top">
<p>
<a href="http://hdl.handle.net/11858/00-246C-0000-001F-7C43-1">SaCoCo—Saarbrücken Cookbook Corpus</a>
</p>
<p>
<strong>Size: </strong>436,000 tokens
<br>
<strong>Annotation: </strong>PoS-tagged using the STTS tagset, lemmatised, normalised
<br>
<strong>Licence: </strong>CC-BY-NC-SA-3.0
</p>
</td>
<td valign="top">
German
</td>
<td valign="top">
<p>This corpus contains historical cookbook recipes from  1569 to 1800, as well as contemporary ones from 2012.</p>
<p>The corpus is available through the CQPweb concordancer provided by CLARIN-D.</p>


</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="https://corpora.clarin-d.uni-saarland.de/cqpweb/sacoco"><span class="fa fa-search"></span>Concordancer</a></p>


</td>
</tr>
</tbody>
Expand Down Expand Up @@ -1538,6 +1538,61 @@ <h3 id"table-title">Monolingual corpora</h3>
<p><a class="btn btn-primary text-nowrap" href="http://hdl.handle.net/20.500.14106/2482"><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
</tr>
</tbody>
<tbody>
<tr>
<td valign="top">
<p>
<a href="https://hdl.handle.net/21.11129/0000-000D-F934-0 ">CIPM</a>
</p>
<p>
<strong>Size: </strong>3.5 million words
<br>
<strong>Licence: </strong>CC-BY-NC-ND
</p>
</td>
<td valign="top">
Portuguese
</td>
<td valign="top">
<p>This is a corpus of historical, religious, notarial, literary texts in prose and verse.</p>
<p>The corpus is available from PORTULAN.</p>


</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="https://hdl.handle.net/21.11129/0000-000D-F934-0"><span class="fa fa-search"></span>Browse</a></p>
<p><a class="btn btn-primary text-nowrap" href="https://hdl.handle.net/21.11129/0000-000D-F934-0 "><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
</tr>
</tbody>
<tbody>
<tr>
<td valign="top">
<p>
<a href="https://hdl.handle.net/21.11129/0000-000D-F8CE-4 ">Portuguese Parish Memories (1758)</a>
</p>
<p>
<strong>Licence: </strong>CC BY
</p>
</td>
<td valign="top">
Portuguese
</td>
<td valign="top">
<p>This is a corpus of historical surveys from the 18th century.</p>
<p>The corpus is available from PORTULAN.</p>


</td>
<td valign="top">
<p><a class="btn btn-primary text-nowrap" href="https://hdl.handle.net/21.11129/0000-000D-F8CE-4 "><span class="fa fa-arrow-circle-o-down"></span>Download</a></p>


</td>
</tr>
</tbody>
Expand Down

0 comments on commit 9fbbd42

Please sign in to comment.