-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'develop' into feature/corpus-model-validation
- Loading branch information
Showing
50 changed files
with
528 additions
and
338 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 18 additions & 0 deletions
18
backend/addcorpus/migrations/0004_alter_corpusconfiguration_category.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Generated by Django 4.1.9 on 2023-09-21 14:16 | ||
|
||
from django.db import migrations, models | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('addcorpus', '0003_add_corpusconfiguration'), | ||
] | ||
|
||
operations = [ | ||
migrations.AlterField( | ||
model_name='corpusconfiguration', | ||
name='category', | ||
field=models.CharField(choices=[('parliament', 'Parliamentary debates'), ('periodical', 'Newspapers and other periodicals'), ('finance', 'Financial reports'), ('ruling', 'Court rulings'), ('review', 'Online reviews'), ('inscription', 'Funerary inscriptions'), ('oration', 'Orations'), ('book', 'Books')], help_text='category/medium of documents in this dataset', max_length=64), | ||
), | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
*Eighteenth Century Collections Online (ECCO)* is a fully text-searchable corpus of books, pamphlets and broadsides in all subjects printed between 1701 and 1800. It currently contains over 135,000 titles amounting to over 26 million fully searchable pages. *ECCO* is a digitization of the eighteenth-century section of the works catalogued in the *English Short-title Catalogue (ESTC)*. | ||
|
||
Most of these works were printed in England, Scotland, Ireland and the United States, but it also contains works printed in territories under British colonial rule as well as from countries across Europe and Asia. | ||
|
||
The corpus includes everything from six-penny broadsheets, pamphlets, books, government documents and more, written by or about people of all professions and classes. | ||
|
||
### Subjects | ||
|
||
- Multidisciplinary | ||
- Eighteenth-century knowledge, thought, beliefs, events | ||
- Age of Enlightenment | ||
- Histories | ||
- Poetry | ||
- Novels | ||
- Plays | ||
- Law books | ||
- Biographies | ||
- Science | ||
- Philosophy | ||
- Dictionaries | ||
- Theology/ Religion | ||
- Diaries | ||
- Almanacs | ||
- … and many more | ||
|
||
### Read more | ||
|
||
Additional information can be found in the links below. | ||
|
||
- [Access through publisher website (requires Utrecht University login)](https://go-gale-com.proxy.library.uu.nl/ps/start.do?p=ECCO&u=utrecht) | ||
- [About this archive (publisher website; requires Utrecht University login)](https://go-gale-com.proxy.library.uu.nl/ps/helpCenter?userGroupName=utrecht&inPS=true&nspage=true&prodId=ECCO&docId=EFZIPA587871271) | ||
- [Sample topics and searches (publisher website; requires Utrecht University login)](https://go-gale-com.proxy.library.uu.nl/ps/helpCenter?userGroupName=utrecht&inPS=true&nspage=true&prodId=ECCO&docId=OAWADC058207024&title=Sample%20Topics%20and%20Searches) | ||
|
||
### Availability | ||
|
||
*ECCO* is published by [Gale](https://en.wikipedia.org/wiki/Gale_(publisher)) and is only available to members of Utrecht University. | ||
|
||
*Note:* Only the *ECCO Part I* is available on I-analyzer. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
30 changes: 30 additions & 0 deletions
30
backend/corpora/guardianobserver/description/guardianobserver.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
|
||
This corpus contains articles from *The Guardian* and *The Observer.* | ||
|
||
### The Guardian | ||
|
||
*The Guardian* is a British daily newspaper, originally founded in 1821 as *The Manchester Guardian*. It is a sister newspaper to both *The Observer* and *The Guardian Weekly*. It is considered a “newspaper of record” and is currently one of the most widely read in the UK and a respected newspaper in the world. | ||
|
||
Political alignment: Centre-left | ||
|
||
### Observer | ||
|
||
*The Observer* is a British newspaper published weekly on Sundays. It is the world's oldest Sunday newspaper and is a sister paper to both *The Guardian* and *The Guardian Weekly*. | ||
|
||
Political alignment: Centre-left; British republicanism | ||
|
||
### Subjects | ||
|
||
- Historical local, regional and national news | ||
- Multidisciplinary | ||
|
||
### Read more | ||
|
||
- [The Guardian (Wikipedia)](https://en.wikipedia.org/wiki/The_Guardian) | ||
- [Official website of The Guardian](https://www.theguardian.com/international) | ||
- [The Observer (Wikipedia)](https://en.wikipedia.org/wiki/The_Observer) | ||
- [Access through publisher website (requires Utrecht University login)](https://www.proquest.com/hnpguardianobserver/index?parentSessionId=SBW10zSG6gyVTa17wSPUIoNhfaXQZBxx2UvOA9%2FiYto%3D&accountid=14772) | ||
|
||
### Availability | ||
|
||
The Guardian/Observer corpus is published by [ProQuest](https://en.wikipedia.org/wiki/ProQuest) and is only available to members of Utrecht University. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,34 @@ | ||
### The Times Digtial Archive 1785-2012 | ||
*The Times* is a British daily national newspaper, originally founded in 1785 as *The Daily Universal Register*. *The Times* is the oldest daily newspaper in continuous publication and remains one of the most widely read and respected newspapers in the world. It is a sister newspaper to *The Sunday Times*. | ||
|
||
Political alignment: Conservative; Centre-right | ||
|
||
This corpus contains a full-text version of 200 years of *The Times*, a critical source for studying a range of subjects. | ||
|
||
This corpus contains a full-text version of 200 years of The Times, a critical source for studying a range of subjects. | ||
All issues of this period are present, with the following exceptions: | ||
- Issues of march 1785: they are missing in the publisher's archive. | ||
- Issues in date range 01/01/1979 - 31/10/1979: during this period, a major general strike occured and no newspaper editions were published. | ||
- Issues of March 1785: they are missing from the publisher's archive. | ||
- Issues in date range 01/01/1979 - 31/10/1979: during this period, a major general strike occurred, and no newspaper editions were published | ||
|
||
### Subjects | ||
|
||
- Historical local, regional and national news | ||
- Multidisciplinary | ||
- Business | ||
- Humanities | ||
- Political Science | ||
- Philosophy | ||
- Major international historical events | ||
|
||
### Read more | ||
|
||
- [The Times (Wikipedia)](https://en.wikipedia.org/wiki/The_Times) | ||
- [Access through publisher website (requires Utrecht University login)](https://go-gale-com.proxy.library.uu.nl/ps/start.do?p=TTDA&u=utrecht) | ||
- [About this archive (publisher website; requires Utrecht University login)](https://go-gale-com.proxy.library.uu.nl/ps/helpCenter?userGroupName=utrecht&inPS=true&nspage=true&prodId=TTDA&docId=QCOGMG579883681) | ||
- [Sample topics and searches](https://go-gale-com.proxy.library.uu.nl/ps/helpCenter?userGroupName=utrecht&inPS=true&nspage=true&prodId=TTDA&docId=GCANVE436736839&title=Sample%20Topics%20and%20Searches) | ||
|
||
### Availability | ||
|
||
This corpus is published by [Gale](https://en.wikipedia.org/wiki/Gale_(publisher)) and is only available to members of Utrecht University. | ||
|
||
### Image source | ||
|
||
Corpus image from [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Twice_round_the_clock;_or,_The_hours_of_the_day_and_night_in_London_(1859)_(14776691334).jpg) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.