Skip to content

Latest commit

 

History

History

romanzi

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Corpus of Italian Novels

This corpus contains 21 texts from 15 Italian authors (2 279 182 tokens). The dates of the first edition of the texts are between 1850 and 1915. See the "metadata.csv" file or the teiHeader in the files for more metadata information.

Formats

  • tei: following the Text Encoding Initiative and valid against the CLiGS schema (File name: id.xml)
  • txt_id: simple plain text of the body (File name: id.txt)
  • annotated: TEI files further annotated with FreeLing and WordNet (keeping teiHeader)

Schema

  • The TEI schema for the basic and the linguistically annotated TEI files corresponds to the general CLiGS schema which is available in the CLiGS reference repository.
  • The metadata keywords used in the text classification section of the TEI header are controlled by an external TEI keywords file and a schematron file which are stored in the keywords folder.

Copyright and Citation

  • The author's copyright of this texts have already expired.
  • The files of the collection are modifications of the corresponding text files from 'Liber Liber'(https://www.liberliber.it/online/), which are provided with a Creative Commons BY-NC-SA 4.0 license. The collection, the added annotation and metadata are published under a Creative Commons BY-NC-SA 4.0 license.
  • Please provide a reference if you use this data in your teaching or research. The following is a citation suggestion: Corpus of Italian Novels, edited by Katrin Betz and Schöch, Christof. Würzburg: CLiGS, 2017. https://github.com/cligs/textbox/tree/master/italian/romanzi.