diff --git a/collationtools/collationtools-tei.xml b/collationtools/collationtools-tei.xml new file mode 100644 index 0000000..a2e9ecd --- /dev/null +++ b/collationtools/collationtools-tei.xml @@ -0,0 +1,3290 @@ + + + + Juxta Web Service, LERA, and Variance Viewer. Web based collation tools for TEI + + author + + + Torsten + Roeder + + + Leopoldina + Halle (Saale), Germany + + torsten.roeder@leopoldina.org + + + + + Institut für Dokumentologie und Editorik + 2020-01-18 + https://ride.i-d-e.de/issues/issue-11 + https://ride.i-d-e.de/issues/issue-11/web-based-collation-tools/ + 10.18716/ride.a.11.5 + + + + + + + + + + + Juxta Web Service + NINES, Performant Software, Gregor Middell, Ronald Dekker + + + + + + + + 2009 + + + http://juxtacommons.org/ + + + + + + LERA + Marcus Pöckelmann + + + + + + + + 2015 + + + http://lera.uzi.uni-halle.de/ + + + + + + Variance Viewer + Nico Balbach + + + + + + + + 2018 + + + http://variance-viewer.informatik.uni-wuerzburg.de/Variance-Viewer/ + + + + 2019-12-20 + + + +

Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +

+
+
+ + + + + + + cf. Catalogue 0.1.1 + + What type of software is it? + + + + + + + + + + + + + + cf. Catalogue 1.4 + + On which platform runs the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + For what purpose was the tool developed? + + + + + + + + + + + + + + cf. Catalogue 1.6 + + Which is the financial model of the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + What is the development stage of the tool? + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Does the tool reuse portions of other existing software? + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + doc, rtf, epub + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + epub + + + + + + cf. Catalogue 2.4 + + Which character encoding formats are supported? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Is a pre-processing conversion included? + + + + + + + cf. Catalogue 3.2 + + Does the documentation list dependencies on other software, libraries or hardware? + + + + + + If yes, is the software handling the installation of dependencies during the general installation process (you don't have to install them manually before the installation)? + + + + + + + + + cf. Catalogue 3.4 + + Is documentation and/or a manual available? (tool website, wiki, blog, documentation, or tutorial) + + + + + + + cf. Catalogue 3.3 + + Which format has the documentation? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + Which of the following sections does the documentation contain? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + In what languages is the documentation available? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is there a method to get active support from the developer(s) or from the community? + + + + + + + cf. Catalogue 3.4 + + Which form of support is offered? + + + + + + + + + + + + + + + + + + + + + e-mail + + + + + + + + + + + cf. Catalogue 3.4 + + Is it possible to post bugs or issue using issue tracker mechanisms? + + + + + + + cf. Catalogue 3.6 + + Grade how straightforward it is to build or install the tool on a supported platform: + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.7 + + Is there a test suite, covering the core functionality in order to check that the tool has been correctly built or installed? + + + + + + + cf. Catalogue 3.8 + + On which platforms can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + On which devices can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: On which browsers can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: Does the tool rely on browser plugins? + + + + + + + cf. Catalogue 3.8 + + Is there an API for the tool? + + + + + + + cf. Catalogue 3.9 + + Is the source code open? + + + + + + + cf. Catalogue 3.9 + + Under what license is the tool released? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.9 + + Does the software make adequate acknowledgement and credit to the project contributors? + + + + + + + cf. Catalogue 3.9 + + Is the tool/software registered in a software repository? + + + + + + If yes, can you contribute to the software development via the repository/development platform? + + + + + + + cf. Catalogue 3.10 + + Can the code be analyzed easily (is it structured, commented, following standards)? + + + + + + + cf. Catalogue 3.10 + + Can the code be extended easily (because there are contribution mechanisms, attribution for changes and backward compatibility)? + + + + + + + cf. Catalogue 3.10 + + Can the code be reused easily in other contexts (because there are appropriate interfaces and/or a modular architecture)? + + + + + + + cf. Catalogue 3.11 + + Does the software provide sufficient information about the treatment of the data entered by the users? + + + + + + + cf. Catalogue 3.12 + + Is there information available whether the tool will be supported currently and in the future? + + + + + + + cf. Catalogue 3.13 + + Does the tool supply citation guidelines (e.g. using the Citation File Format)? + + + + + + + + + + cf. Catalogue 4.1 + + What kind of users are expected? + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.1 + + What kind of user interactions are expected? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.2 and 0.1.1 + + What kind of interface does the tool provide? + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.3 + + Does the tool provide a particular visualizations (in terms of analysis) of the input and/or the output data? + + + + + + + cf. Catalogue 4.4 + + Is the user allowed to customize the functioning of the tool and the output configuration? + + + + + + + cf. Catalogue 4.5 + + Does the tool provide particular features for improving accessibility, allowing „people with the widest range of characteristics and capabilities" to use it? + + + + + + + + + + + cf. Catalogue 0.1.1 + + What type of software is it? + + + + + + + + + + + + + + cf. Catalogue 1.4 + + On which platform runs the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + For what purpose was the tool developed? + + + + + + + + + + + + + + cf. Catalogue 1.6 + + Which is the financial model of the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + What is the development stage of the tool? + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Does the tool reuse portions of other existing software? + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + rtf + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + tex + + + + + + cf. Catalogue 2.4 + + Which character encoding formats are supported? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Is a pre-processing conversion included? + + + + + + + cf. Catalogue 3.2 + + Does the documentation list dependencies on other software, libraries or hardware? + + + + + + If yes, is the software handling the installation of dependencies during the general installation process (you don't have to install them manually before the installation)? + + + + + + + + + cf. Catalogue 3.4 + + Is documentation and/or a manual available? (tool website, wiki, blog, documentation, or tutorial) + + + + + + + cf. Catalogue 3.3 + + Which format has the documentation? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + Which of the following sections does the documentation contain? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + In what languages is the documentation available? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is there a method to get active support from the developer(s) or from the community? + + + + + + + cf. Catalogue 3.4 + + Which form of support is offered? + + + + + + + + + + + + + + + + + + + + + e-mail + + + + + + + + + + + cf. Catalogue 3.4 + + Is it possible to post bugs or issue using issue tracker mechanisms? + + + + + + + cf. Catalogue 3.6 + + Grade how straightforward it is to build or install the tool on a supported platform: + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.7 + + Is there a test suite, covering the core functionality in order to check that the tool has been correctly built or installed? + + + + + + + cf. Catalogue 3.8 + + On which platforms can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + On which devices can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: On which browsers can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: Does the tool rely on browser plugins? + + + + + + + cf. Catalogue 3.8 + + Is there an API for the tool? + + + + + + + cf. Catalogue 3.9 + + Is the source code open? + + + + + + + cf. Catalogue 3.9 + + Under what license is the tool released? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.9 + + Does the software make adequate acknowledgement and credit to the project contributors? + + + + + + + cf. Catalogue 3.9 + + Is the tool/software registered in a software repository? + + + + + + If yes, can you contribute to the software development via the repository/development platform? + + + + + + + cf. Catalogue 3.10 + + Can the code be analyzed easily (is it structured, commented, following standards)? + + + + + + + cf. Catalogue 3.10 + + Can the code be extended easily (because there are contribution mechanisms, attribution for changes and backward compatibility)? + + + + + + + cf. Catalogue 3.10 + + Can the code be reused easily in other contexts (because there are appropriate interfaces and/or a modular architecture)? + + + + + + + cf. Catalogue 3.11 + + Does the software provide sufficient information about the treatment of the data entered by the users? + + + + + + + cf. Catalogue 3.12 + + Is there information available whether the tool will be supported currently and in the future? + + + + + + + cf. Catalogue 3.13 + + Does the tool supply citation guidelines (e.g. using the Citation File Format)? + + + + + + + + + + cf. Catalogue 4.1 + + What kind of users are expected? + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.1 + + What kind of user interactions are expected? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.2 and 0.1.1 + + What kind of interface does the tool provide? + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.3 + + Does the tool provide a particular visualizations (in terms of analysis) of the input and/or the output data? + + + + + + + cf. Catalogue 4.4 + + Is the user allowed to customize the functioning of the tool and the output configuration? + + + + + + + cf. Catalogue 4.5 + + Does the tool provide particular features for improving accessibility, allowing „people with the widest range of characteristics and capabilities" to use it? + + + + + + + + + + + cf. Catalogue 0.1.1 + + What type of software is it? + + + + + + + + + + + + + + cf. Catalogue 1.4 + + On which platform runs the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + For what purpose was the tool developed? + + + + + + + + + + + + + + cf. Catalogue 1.6 + + Which is the financial model of the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + What is the development stage of the tool? + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Does the tool reuse portions of other existing software? + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which character encoding formats are supported? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Is a pre-processing conversion included? + + + + + + + cf. Catalogue 3.2 + + Does the documentation list dependencies on other software, libraries or hardware? + + + + + + If yes, is the software handling the installation of dependencies during the general installation process (you don't have to install them manually before the installation)? + + + + + + + + + cf. Catalogue 3.4 + + Is documentation and/or a manual available? (tool website, wiki, blog, documentation, or tutorial) + + + + + + + cf. Catalogue 3.3 + + Which format has the documentation? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + Which of the following sections does the documentation contain? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + In what languages is the documentation available? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is there a method to get active support from the developer(s) or from the community? + + + + + + + cf. Catalogue 3.4 + + Which form of support is offered? + + + + + + + + + + + + + + + + + + + + + e-mail + + + + + + + + + + + cf. Catalogue 3.4 + + Is it possible to post bugs or issue using issue tracker mechanisms? + + + + + + + cf. Catalogue 3.6 + + Grade how straightforward it is to build or install the tool on a supported platform: + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.7 + + Is there a test suite, covering the core functionality in order to check that the tool has been correctly built or installed? + + + + + + + cf. Catalogue 3.8 + + On which platforms can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + On which devices can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: On which browsers can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: Does the tool rely on browser plugins? + + + + + + + cf. Catalogue 3.8 + + Is there an API for the tool? + + + + + + + cf. Catalogue 3.9 + + Is the source code open? + + + + + + + cf. Catalogue 3.9 + + Under what license is the tool released? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.9 + + Does the software make adequate acknowledgement and credit to the project contributors? + + + + + + + cf. Catalogue 3.9 + + Is the tool/software registered in a software repository? + + + + + + If yes, can you contribute to the software development via the repository/development platform? + + + + + + + cf. Catalogue 3.10 + + Can the code be analyzed easily (is it structured, commented, following standards)? + + + + + + + cf. Catalogue 3.10 + + Can the code be extended easily (because there are contribution mechanisms, attribution for changes and backward compatibility)? + + + + + + + cf. Catalogue 3.10 + + Can the code be reused easily in other contexts (because there are appropriate interfaces and/or a modular architecture)? + + + + + + + cf. Catalogue 3.11 + + Does the software provide sufficient information about the treatment of the data entered by the users? + + + + + + + cf. Catalogue 3.12 + + Is there information available whether the tool will be supported currently and in the future? + + + + + + + cf. Catalogue 3.13 + + Does the tool supply citation guidelines (e.g. using the Citation File Format)? + + + + + + + + + + cf. Catalogue 4.1 + + What kind of users are expected? + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.1 + + What kind of user interactions are expected? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.2 and 0.1.1 + + What kind of interface does the tool provide? + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.3 + + Does the tool provide a particular visualizations (in terms of analysis) of the input and/or the output data? + + + + + + + cf. Catalogue 4.4 + + Is the user allowed to customize the functioning of the tool and the output configuration? + + + + + + + cf. Catalogue 4.5 + + Does the tool provide particular features for improving accessibility, allowing „people with the widest range of characteristics and capabilities" to use it? + + + + + + + + +
The review presents and compares three open source and web-based text collation tools, namely Juxta Web Service, LERA, and Variance Viewer, and investigates their suitability especially for philological work with TEI. While all tools adequately fulfill the general requirements of text collation, each of them supports individual workflow concepts and visualization methods. It is recommended that future developments consider supporting TEI standoff markup and tool modularization.
Introduction

Have you ever needed to collate two copies or editions of a historical document in order to quickly and reliably identify the differences between them? In digital scholarly editions, variant readings are often encoded manually by human editors. Tools for automatic text collation do exist, but most of them are either difficult to use, or they address specific use cases. As such, there is a need for easy-to-use, accessible, and adaptable tools which can handle textual variance and text alignment, including support for TEI-XML markup. In other words, some tools exist, but do they really fit the philological and technical requirements of researchers?

This article starts with a quick overview of text collation in philology and computer science. It goes on to develop a suitable review method and analyses three TEI-interoperable, web based collation tools: Juxta Web Service. , LERA. , and Variance Viewer.. The review closes with a comparison of these tools and a discussion about workflow integration and future requirements.

This review focuses on non-proprietary online tools which require knowledge of XML-TEI, but no programming skills. For this reason, some important instruments for scholarly text collation are not included in this review, like TUSTEP., Classical Text Editor., Versioning Machine., Oxygen XML Editor. and CollateX.. Nury (2018) discusses some of these tools (and others) in her comprehensive study (Nury 2018: 205–239), and provides a detailed list of scholarly collation tools (ibid. 305–316). For a comparative overview of tools that focus on version control instead of text collation, Wikipedia hosts an exhaustive list of file comparison software..

Perspectives on text collation
Preliminaries: text comparison and collation

Text comparison means identifying differences between two or more texts, by comparing them directly to each other. This technique is used in a broad range of scientific disciplines in which text is involved: in computer science for tracking changes in code; in jurisprudence for identifying relevant modifications of laws or contracts; even in bioinformatics for aligning DNA sequences (though DNA is not literally text) – just to give a few examples.

Often used in authoring environments, text comparison generally follows functional purposes, such as when collaborative writing teams utilize it to revert unwanted or ill-intended modifications, in teaching and academic writing in order to check for plagiarism, or in code development to avoid conflicting changes. However, another use case of text comparison exists that aims at a better understanding of cultural processes. Scholars from all historical disciplines have an interest in retracing the creation and transmission of texts in order to establish connections or divergences between them. In textual criticism, the epistemic process of text comparison is called collation (to collate: from Latin collatus, “to bring together and compare, examine critically as to agreement“; cf. Plachta 2013: 173).

The following two sections engage with divergent understandings of text: In computer science, text is understood as a linear sequence of characters stored in an electronic document (file), while in philology, text is interpreted on multiple dimensions, for example by distinguishing text as a single physical document from text as an abstract idea (for a comprehensive theory of text models, see Sahle 2013: 45–60). These two concepts operate on different levels, and a philological text collation tool needs to build a bridge between technical and scholarly understanding.

Philological aspects
a) Textual criticism
Fig. 1: Scheme of descent of the manuscripts of Pseudo-Apuleius Herbarius, in: Ernst Howald and Henry E. Sigerist: Antonii Musae De herba vettonica liber, Leipzig 1927.Source: Wikimedia Commons, .

Text collation is a technique used in textual criticism, which aims at retracing the genesis and the history of texts. It plays a major role when uncovering developments and dissemination of thoughts, ideas and concepts represented within texts. As sub-disciplines, the study of the textual genesis focuses on the process of creation of a textual work, while the study of the textual history focuses on modifications of the text in the course of its subsequent transmissions. It is not unusual, for instance, that a philologist consults an immense number of copies and related sources while engaging in the study of the history of one text; in philology, the manuscript transmission is often represented as a stemma (referring to Karl Lachmann; Plachta 2013: 29–31), a tree-like figure which points out relationships between sources (see Fig. 1).

b) Traditional methods
Fig. 2: Example of a critical apparatus, as in Die Metaphysik des Aristoteles, ed. by Albert Schwegler, Tübingen 1847, p. 1;Source: Austrian National Library, . ellipsis and highlighting added by the reviewer.

Traditionally, philological text collation is performed manually. One text is placed along the other, and an editor compares them word by word and letter by letter. Changes are catalogued in a critical apparatus (see Fig. 2): A typical entry starts with the variant’s position (usually line or paragraph number), followed by the lemma of the base text often delimited by a closing square bracket ‘]’. Then follows the variant text itself and finally the corresponding sigla of the sources (see Fig. 2, highlighted areas). Due to possible human errors, each version should ideally be collated at least two times by independent editors. While this method guarantees high reliability, it does cost a lot of time and effort.

Editors need to establish clear guidelines to define the phenomena of interest for future users of an edition and those that will be explicitly disregarded during the collation. This is also an economic decision: in many cases, tracking of graphemic variance (e.g. æ/ä) or regular orthography (e.g. color/colour) requires immense additional resources. Even if such variants are potentially interesting for later studies, the required human effort is often not justifiable. Furthermore, the more details have to be tracked, the more human errors might increase.

There are various conventional methods to present the results of a given text collation in printed editions, while digitally based presentations are still in development. Independently from the media, classical scholarly editions prefer a critical apparatus: one witness or the established critical text is chosen as base-text, while all variants are presented in footnotes or endnotes (Plachta 2013: 100–102). A more visual presentation mode is synopsis: all collated texts are presented in parallel, either in vertical or horizontal order (Plachta 2013: 106–111).

c) Advantages of digital tools

Digital tools can help. First, automatic collation can be significantly faster, if all witnesses are already fully transcribed. Philologists could focus their attention on semantics and leave the tedious letter-by-letter-collation to the machine. Or, argued from a positive perspective, the time and energy saved by automatic collation can be fully used for interpreting and refining the results.

Second, an automatic collation process would always produce the same result, thus is more reliable than a human produced collation. Furthermore, during the process of collating, humans naturally tend to produce individual interpretations and need to make assumptions on a writer’s or scribe’s possible intentions (Plachta 2013: 115–121). Instead, computers do not interpret, unless instructed to, but focus exclusively on comparing the texts. These two very different competences – precision on the one hand and interpretation on the other – could be utilized to divide the process into an automated collation layer and a human interpretation layer.

Fig. 3: English Bible Variants of Genesis 1,1, visualized by BibleViz.Source: Holy Bible, .

Third, digital tools can also help to create presentations dynamically, for example in graph based visualizations (see Fig. 3). Some digital environments even leave the choice of a visualization method and base-text to the user, and offer analytic tools to refine philological findings significantly beyond mere collation results (Rieder et al. 2012: 73–75). The creation of user-configurable visualizations is supported by a basic digital paradigm: separating the visual presentation (e.g. HTML) from the encoded representation (e.g. XML).

Technical aspects
a) Types of textual variation

Analyzing differences between texts implies awareness about which type of variation can actually occur. Given two sequences A and B, the two basic cases of textual difference can be described as: addition (B contains a sequence C which does not exist in A), e.g. read => readingdeletion (A contains a sequence C which does not exist in B), e.g. reading => read

Two more complex categories of variation are: substitution (A and B contain different sequences at the same place), e.g. whether/weathertransposition (A and B contain the same sequence at different places), e.g. plums/lumps

Identifying substitution and transposition is less trivial than additions and deletions (for an analysis on the transposition problem, see Schöch 2016). Both substitution and transposition can also be interpreted as successive additions and deletions, and this ambiguity often makes it difficult to decide what the original intention actually was. It is a philologist’s task to produce a well-founded and weighted interpretation of a scribe’s possible intentions or error’s causes. Here lies a watershed between the different procedures for identifying substitutions or transpositions in philology and computer science: a textual scholar decides by personal knowledge and experience, an algorithm by formal models.

b) Technical approaches

The first algorithms for text comparison were invented in the 1960s. The most popular example is called Levenshtein Distance (Levenshtein 1966) and expresses textual difference as ‘edit distance’, which is the calculus of the minimal number of characters that need to be added, deleted or substituted to transform sequence A into sequence B. In the 1970s, it was the Hunt-McIlroy-Algorithm which solved the longest common subsequence problem, and revolutionized file comparison when it was implemented for the Unix command ‘diff’ (Hunt and Mcllroy 1976), capable of identifying and describing differences line wise (not only character wise) in human and machine readable format. Larry Walls invented ‘patch’ in the 1980s, a Unix program capable of reverting changes by saving the difference in a separate file (“Patch” 2019). These algorithms or optimized variations of them have been in use until today.

Programming linguists and philologists have implemented collation tools since the 1960s. Collating long texts turned out to be a complex scenario, with some major developments around 1990 (Spadini 2016: 112–116). In 2009, a working group presented an abstract framework for complex text collation procedures, which was later called ‘Gothenburg model’.. It distinguishes five different computing stages (tokenization, normalization, alignment, analysis, and visualization). Nassourou (2013) proposed to extend this model by a sixth, interpretative layer. It is important to be at least aware of the existence and the functions of these layers, in order to better understand differences between individual tools, their workflows, and possibly their configuration options – and to better interpret the results.

c) General preliminaries

The following questions should be considered before starting the collation procedure: What is the smallest unit (token) of a compared sequence? Is it a letter, a word, a phrase or even a paragraph?Should normalization rules apply? Collation can be strictly character based, but often it is expected that specific differences are ignored, e.g. whitespace (sequences of more than one whitespace character are usually treated as one), abbreviations (e.g. ‘&’ as equivalent of ‘and’), spelling, or graphemics.Should presentational or rendering aspects be included? In electronic texts, these are usually recorded in markup (as in xml, html, markdown, latex and others). In TEI, a typical use case would be the attribute @rend.Should collation also take the formal structure of the documents into account? E.g. in XML, should the algorithm compare every node, or focus on specified elements?Should subsequent additions and deletions automatically be interpreted as substitutions?Is it required to identify transpositions as well?

Review procedure
Method

On the one hand, the review aims at pointing out the individual strengths and specific qualities of each of the three presented tools. On the other hand, the review should also cover general aspects systematically. To this aim, a short list of basic features that helps to conduct this task, obligatory and non-obligatory, is presented first. All tools were tested with the same examples.

Requirements

The list of requirements contains both obligatory features, which are expected by each of the tested tools, and non-obligatory features, which are understood as additional ones. web interface for uploading two or more documents (preferably with TEI support)initiation of a text collation procedure non-obligatory: flexible collation parameters (e.g. word or letter based, normalization, etc.)non-obligatory: good performance with very large portions of text identification of basic types of textual difference (see section on technical approaches) non-obligatory: identification of sections of transposed textnon-obligatory: automatic classification of variants in systematic groups browser-based visualization of the results (e.g. synopsis or apparatus, see also section on traditional methods)results export in any format (preferably TEI apparatus encoded)open source code availablenon-obligatory: easily accessible and free of charge (at least for basic services)

Examples

Although the selected tools were all intended to work as generic solutions, they have been developed from individual starting points or specific methodological perspectives on text analysis. Furthermore, there is a great variety of possible use cases – diversified by languages, epochs, genres, individual styles – which are too manifold to be adequately represented within the scope of this review. For this review, examples were chosen which are capable of demonstrating the general functionalities of the tools, while the suitability for specific use cases needs to be tested by the individual users themselves.

Two sets of texts have been used to test the tools:Test datasets and result files are available at . a constructed dummy text, which is based on filler texts and covers the basic scenarios of textual variance (see section on types of textual variation).a real example and was taken from The Shakespeare Quartos Archive.. Shakespeare’s Hamlet, written between 1601 and 1602, is available in many printed versions which were digitized, transcribed, TEI-encoded and published under a Creative Common License (CC BY-NC 2.0). For testing on different complexity levels, another two versions of the documents were produced for this review: A smaller file containing only act 3, scene 1 (‘To be, or not to be’) with original, complex encoding, and another file with a simplified baseline encoding.

Juxta Web Service
a) Introduction

Juxta (, tested version: 1.8.3-BETA) is a text collation software produced by the University of Virginia in 2009. It was developed into the web application ‘Juxta Web Service’ by the Networked Infrastructure for Nineteenth-Century Electronic Scholarship (NINES). and Performant Software with support from Gregor Middell and Ronald Dekker. It received the Google Digital Humanities Award. in 2010 and was originally intended to become a part of Google Books. The development seems to have been discontinued in 2014 (no updates since then), and the website presents itself as ‘beta’. However, Juxta Web Service is still active, and the code is available on Github. under an Apache 2.0 license.

b) Workflow

Juxta Web Service requires an account which can be created online free of charge with a few clicks. The collation procedure follows three steps: First, the user uploads two or more files that he/she wishes to collate and that will appear as ‘sources’ in the dashboard. Juxta WS accepts a broad range of input formats, such as TXT, XML, HTML, DOC, RTF, PDF and EPUB. The documents can also be retrieved from a URL or pasted into a text field (and, as a special feature, it is even possible to refer to Wikipedia articles and their revisions). If desired, source files can be reformatted (there is also an XML indent function) and edited directly in the browser, and saved as a new source. Secondly, each source needs to be prepared as ‘witness’. This means that a distinct name needs to be assigned to each source, while the text is being tokenized automatically in the background. The whole process becomes transparent and can also be modified in the ‘XML View’, which displays the XSLT transformation templates. For example, Juxta WS omits all elements which are not on the TEI block level (e.g. highlights) by default, unless this behavior is changed. Finally, the user selects two or more witnesses to form a ‘comparison set’. For the collation process, the user can define how punctuation, character case, hyphenation and line breaks should be handled.

c) Output
Fig. 4: Example for Juxta Web Service’s Heat Map.
Fig. 5: Example for Juxta Web Service’s Side-by-side View’.

Juxta Web Service presents the result of the collation in a number of different views. The ‘Heat Map’ displays the base text with highlighted differences (see Fig. 4). The base text can be changed dynamically, and each single variant can be annotated manually. The ‘Side-by-side View’ is a synoptic view of two selected witnesses, with an optional ‘Histogram’ (see Fig. 5). Finally, the integrated Versioning Machine. provides a user-configurable synoptic view which is intended to display two or more collated witnesses simultaneously.

Fig. 6: Example for Juxta Web Service’s TEI view and export.

The results can be exported in various formats, e.g. a TEI encoded version following the parallel segmentation method, and ready for further use (see Fig. 6). Also HTML or DOCX output is possible, including a classical apparatus which follows a simple output of the base text.

d) Summary

Juxta Web Service’s in-depth and thoughtfully developed features – although some of them remained in the experimental stage – make it a powerful tool. It offers a well-written documentation. with a thorough explanation of all features. During the testing for this review, Juxta WS sporadically appeared to suffer from performance issues. The nature of the problem remained unclear, however it seemed to be related either to markup complexity or text length. Users should consider that currently, the Juxta Web Service is minimally maintained.

LERA
a) Introduction

LERA (, tested version: 1.1.0.21) is an acronym for: ‘Locate, Explore, Retrace and Apprehend complex text variants’. It is a text collation software which was developed into a web based tool by Marcus Pöckelmann. It is part of the project SADA. (‘Semi-automatische Differenzanalyse von komplexen Textvarianten’), which focuses on analyzing text variants at a larger scale. SADA is located at the University of Halle-Wittenberg (Germany) and was funded by the Federal Ministry of Education and Research from 2012 to 2015. LERA, as one of its parts, was designed in close collaboration with researchers from the humanities, and it received a poster award. in 2016 at the DHd conference. in Leipzig. The code repository is currently not accessible for the public, however it is planned to publish it in accordance with academic Open Science policies.

b) Workflow

LERA can be tried online with a few sample texts. An individual instance with full functionality is available on personal request. After login, the user can upload two or more ‘documents’ for collation. During this procedure, the user assigns a siglum and a title to each document, and optionally sets language, segmentation method, hyphenation handling, and linguistic annotation. LERA works smoothly with XML, and all settings can be changed at different stages of the process. In a second phase, the user selects a set of documents for collation, which will then be displayed as ‘edition’.

c) Output
Fig. 7: LERA, synoptic view with CATview (upper section) and highlighted differences.
Fig. 8: LERA, synoptic view with identic variants highlighted.

LERA’s complex working environment offers a broad range of tools and features for text collation. The basic structure is a synoptic view of all documents, which can be customized with a rich selection of parameters and visual features. Additions, deletions, and substitutions can be color highlighted (see Fig. 7); alternatively, for collation of more than two texts, colors can be used to highlight variants in the texts which are identical in two versions (see Fig. 8) or exist only in one version. Detailed filter rules for normalization can be applied and changed on the fly. The most important distinctive feature of LERA is probably the section alignment, which is a feature to refine the results of the automatic collation of longer texts. Additionally, a navigation tool called CATview. helps to browse through the text by highlighting each section as one square, which is colored according to the frequency of changes in that section. A word cloud is also available. Search terms are highlighted in the synoptic view, in CATview and in the word cloud simultaneously, which is very helpful for orientation (Barabucci 2016). The results can also be downloaded in export formats such as PDF, TEX and HTML.

d) Summary

LERA is an impressively coherent suite of tools for text alignment and collation which allows the user to flexibly combine tools and parameters for individual use cases. Two things are still on the wish list: TEI export (or another structured format like JSON) and a public code repository. The project is likely to be maintained, as it is essential for at least one ongoing large research project.

Variance Viewer
a) Introduction

Variance Viewer (, tested version as of June 2019) is a web application for text collation which was developed by Nico Balbach at the Department for Artificial Intelligence and Applied Computer Science of the University of Würzburg (Germany) in 2018. The idea for its development was initiated by a group of researchers from an academic working group for digital editions.. The goal was to build a generic tool which conducts common text collation tasks for different research projects. The code is available on GitHub. under GPL 3.0 license.

b) Workflow

Variance Viewer can be used without an account. The user can directly upload (exactly) two files to collate on a simple one page dialogue website. Accepted formats are only XML (by default) and TXT (as fallback). The user can upload a configuration file with customized settings (the GitHub documentation page. explains exactly how to set up an individual configuration file). This is crucial when adopting Variance Viewer to individual use cases. For example, it is possible to modify the list of XML elements whose content will be compared: By default, these are only <p> and <head>; all contents outside of these elements will be excluded. The configuration file also allows the user to define normalization rules for graphemic variance, for punctuation and abbreviations.

c) Output
Fig. 9: Variance Viewer, synoptic output.

The web service operates quickly and displays a visualization of the collation result nearly instantly. The most distinctive feature of Variance Viewer is the automatic classification of variants. The tool identifies general content (by default), punctuation, graphemics, abbreviation, typography (depending on the TEI attribute hi/@rend), and separation (whitespace), and visualizes these with a color code (see Fig. 9). Color shades make clear whether there is only one different character in a token (light) or more than one (dark). Each feature can be switched on and off by using respective buttons, and it is possible to adapt the visualization of the results by using CSS properties. This makes Variance Viewer an excellent tool for visual analysis.

Fig. 10: Variance Viewer, collation of typographic features (other deactivated).

A unique feature of Variance Viewer is its ability to identify presentational differences, e.g. as typically described in @rend-attributes. While other collation tools only work on plain text level, Variance Viewer analyzes typographic aspects even if the underlying text is identical (see Fig. 10). Currently, this is implemented for the TEI attribute @rend only, but could potentially be extended to other attributes.

The result is downloadable in TEI/XML format, with variants encoded in parallel segmentation, using elements <lem> and <rdg> for variants in the first and second text respectively, while the variant classification is recorded into the attribute @type of the <app> element. The TEI apparatus needs a manual check for some minor whitespace issues. Other available formats are JSON (according to a data scheme for another tool called Athen.: ‘Annotation and Text Highlighting ENvironment’) or PDF.

d) Summary

Variance Viewer does an excellent job in handling the TEI input. The configuration options are powerful and make Variance Viewer an excellent generic tool. On the output side, it must be mentioned that the downloadable TEI document is not perfectly schema valid in case a variant occurs within an element that does not allow <app> as child, e.g. within <locus>. This is probably not problematic for most use cases, but still needs to be taken into account. The tool is well-suited for inclusion in a workflow at a relatively early stage, e.g. after transcription and base encoding are completed, but before deeper semantic tagging takes place.

Discussion
Conclusions

According to the requirements, all tools provided a web interface for document upload (feature 1) and starting a collation procedure (feature 2), and all of them offer options for individual configurations. The tools’ approaches are very different in this respect: While LERA and Juxta Web Service both offer extremely granular interfaces, Variance Viewer offers high flexibility through an uploadable configuration file. Performance with large portions of text is adequate, but the tools cause heavy load on the client side, as they all load the complete collation result into the browser (instead of smaller portions).

All tools were able to identify additions, deletions and substitutions correctly (feature 3), while transposition is obviously an interpretative issue and needs further analysis or manual editing, as supported by LERA.

Furthermore, all tools offer a parallel synopsis view with variants highlighted (feature 4). Juxta Web Service and LERA both offer a helpful exploration tool for easy navigation through longer texts with Histogram and CATview. Concerning analysis, Variance Viewer has not only developed an interesting approach to classify variants automatically, but it can also detect presentational variants, which is most useful for collating texts with complex typography.

Concerning output formats (feature 5), there is still much to be achieved. Although a schema valid TEI output is available in Juxta Web Service and Variance Viewer, the methods used to structure collation results in XML are very diverse. In each, case it will be necessary to revise the TEI code and adopt it to one’s own practice. The same applies to other output formats, especially presentational formats, but none of the tools offers options to configure PDF and HTML, so that the usefulness of these routines is questionable.

It is positive that the source code of all tools is available (or planned to be) on public repositories (feature 6), so projects have a chance to review or reuse the code and to customize it for their own purposes. Usage of the tools is relatively easy and free of charge, as long as no special implementations are required (feature 7). Concerning accessibility, Variance Viewer follows an interesting lightweight concept, as it does not require any user management nor authentication, while LERA and Juxta Web Service require individual accounts and bind users to their web service.

Outlook

The most debatable aspect of the TEI output routines is that all tools offer only parallel segmentation (TEI Guidelines, 12.2.3.) as encoding method. To merge all collated documents into one without disintegrating schema rules and element hierarchy (not to mention whitespace issues) is an extremely difficult task, and the tools solve this by deleting much of the markup, resulting in a simplified apparatus file with all extra features subtracted. It would be an alternative to implement an output as standoff apparatus, either with the double endpoint attachment (TEI Guidelines, 12.2.2.) or with the location referenced method (TEI Guidelines, 12.2.1.), while the source documents could remain unmodified (as also in graph-oriented text models, see Bleeker et al. 2018). The only condition would be granular encoding with @xml:id on each token, or corresponding milestones. A TEI apparatus based on one of these methods would create a much more reliable encoding and make further presentation routines easier to process (as discussed by Cayless et al. 2019). An advantage of parallel encoding is that it is easier to manually encode it in standard XML editors.

Concerning visualization and analysis, it should be mentioned that there are other tools which could cover this function independently. To give an example, TEICat. by M. Burghart offers a great alternative. Of course, Juxta Web Service, LERA, and Variance Viewer are developed as all-in-one solutions, and are optimized to their individual approach to text collation. However, thinking of the “separation of concerns”, as mentioned in the Gothenburg model, it seems desirable to move towards a modular workflow with more than one tool involved, in order to separate the collation process from analysis, visualization, and export. This would allow the possibility to to develop software independently for each task, which would support more generic and more detailed solutions. It would also be possible to design combined workflows, like for example, Juxta Web Service’s TEI output routine, LERA’s alignment tool and Variance Viewer’s classification algorithm. This would require a solid exchange format, which is not covered by the TEI apparatus, and probably should be covered on a more abstract level.

For the workflow of an edition, it is not only important to decide which tool suits the individual requirements, but also to decide the role of the tool in the workflow: which task(s) should it fulfill, and which tasks are better done manually and/or by using other tools? In the current state of the tools, it is a good idea to use collation tools at a relatively early stage, after all versions of a text have been transcribed and checked for reliability (because any mistake will affect the collation result), but before further semantic or presentational tagging is applied (because of potential markup obliteration). Because a generic solution is unlikely to exist in the near future, it is advisable to rely on a wise combination of independent but interoperable tools and open services.

“Patch (Unix).” 2019. In Wikipedia. .Barabucci, G. 2016. “CATview.” Digital Medievalist, 10 (September 25, 2016). .Bleeker, E., Buitendijk, B, Haentjens, R., et al. 2018. “Including XML Markup in the Automated Collation of Literary Text.” XML Prague 2018 Conference Proceedings, pp. 77–95. .Bremer, T., Molitor, P., Pöckelmann, M., et al. 2015. “Zum Einsatz digitaler Methoden bei der Erstellung und Nutzung genetischer Editionen gedruckter Texte mit verschiedenen Fassungen. Das Fallbeispiel der Histoire philosophique des deux Indes von Guillaume-Thomas Raynal.” Editio, 29,1 (December 15, 2015), pp. 29–51. .Cayless, H., Beshero-Bondar, E., Vigilanti, R. et al. 2019. “Document Modeling with the TEI Critical Apparatus.” (Panel) TEI 2019, What is text, really? TEI and beyond, September 16–20, University of Graz, Austria. Book of Abstracts, pp. 168–170. .Hunt, J. W., and M. D. McIlroy. 1976. “An Algorithm for Differential File Comparison.” Computing Science Technical Report (Bell Laboratories), no. 41.Levenshtein, V. I. 1966. “Binary Codes Capable of Correcting Deletions, Insertions and Reversals.” Soviet Physics Doklady 10 (8): 707–710.Nassourou, M. 2013. Computer-Supported Textual Criticism. Theory, Automatic Reconstruction of an Archetype. Norderstedt: Books on Demand.Nury, E. L. 2018. Automated Collation and Digital Editions From Theory to Practice. London: King’s College (PhD Thesis). .Plachta, B. 2013. Editionswissenschaft. Eine Einführung in Methode und Praxis der Edition neuerer Texte. Stuttgart: Reclam (1st edition: 1997).Rieder, B., Röhle, T. 2012. “Digital Methods: Five Challenges.” in Berry, D. (ed.) Understanding Digital Humanities. Basingstoke: Palgrave Macmillan, pp. 67–84.Sahle, P. 2013. Digitale Editionsformen. Zum Umgang mit der Überlieferung unter den Bedingungen des Medienwandels. Teil 3: Textbegriffe und Recodierung. Norderstedt: Books on Demand.Schöch, C. 2016. “Detecting Transpositions when Comparing Text Versions using CollateX.” The Dragonfly’s Gaze. Computational analysis of literary texts (August 29, 2016). .Spadini, E. 2016. Studi sul “Lancelot” en prose. Roma: Sapienza Università (PhD Thesis). .
\ No newline at end of file diff --git a/collationtools/pictures/picture-1.png b/collationtools/pictures/picture-1.png new file mode 100644 index 0000000..3ad1706 Binary files /dev/null and b/collationtools/pictures/picture-1.png differ diff --git a/collationtools/pictures/picture-10.png b/collationtools/pictures/picture-10.png new file mode 100644 index 0000000..c50d36b Binary files /dev/null and b/collationtools/pictures/picture-10.png differ diff --git a/collationtools/pictures/picture-2.png b/collationtools/pictures/picture-2.png new file mode 100644 index 0000000..4f9d09f Binary files /dev/null and b/collationtools/pictures/picture-2.png differ diff --git a/collationtools/pictures/picture-3.png b/collationtools/pictures/picture-3.png new file mode 100644 index 0000000..5a8b6de Binary files /dev/null and b/collationtools/pictures/picture-3.png differ diff --git a/collationtools/pictures/picture-4.png b/collationtools/pictures/picture-4.png new file mode 100644 index 0000000..78936f6 Binary files /dev/null and b/collationtools/pictures/picture-4.png differ diff --git a/collationtools/pictures/picture-5.png b/collationtools/pictures/picture-5.png new file mode 100644 index 0000000..acd08e1 Binary files /dev/null and b/collationtools/pictures/picture-5.png differ diff --git a/collationtools/pictures/picture-6.png b/collationtools/pictures/picture-6.png new file mode 100644 index 0000000..562bdd2 Binary files /dev/null and b/collationtools/pictures/picture-6.png differ diff --git a/collationtools/pictures/picture-7.png b/collationtools/pictures/picture-7.png new file mode 100644 index 0000000..7d8df2c Binary files /dev/null and b/collationtools/pictures/picture-7.png differ diff --git a/collationtools/pictures/picture-8.png b/collationtools/pictures/picture-8.png new file mode 100644 index 0000000..1fde1bb Binary files /dev/null and b/collationtools/pictures/picture-8.png differ diff --git a/collationtools/pictures/picture-9.png b/collationtools/pictures/picture-9.png new file mode 100644 index 0000000..33a57e6 Binary files /dev/null and b/collationtools/pictures/picture-9.png differ diff --git a/ediarum/ediarum-tei.xml b/ediarum/ediarum-tei.xml new file mode 100644 index 0000000..bf53101 --- /dev/null +++ b/ediarum/ediarum-tei.xml @@ -0,0 +1,1662 @@ + + + + + + + <respStmt> + <resp>author</resp> + <name> + <persName> + <forename>Andreas</forename> + <surname>Mertgens</surname> + </persName> + <affiliation> + <orgName>University of Cologne</orgName> + <placeName>Cologne</placeName> + </affiliation> + <email>a.mertgens@uni-koeln.de</email> + </name> + </respStmt> + </titleStmt> + <publicationStmt> + <publisher>Institut für Dokumentologie und Editorik</publisher> + <date>2020-01-18</date> + <idno type="issue">11</idno> + <idno type="URI">https://ride.i-d-e.de/issue-11/ediarum/</idno> + <idno type="DOI">10.18716/ride.a.11.4</idno> + <availability> + <licence> + <ref target="http://creativecommons.org/licenses/by/4.0/"/> + </licence> + </availability> + </publicationStmt> + <notesStmt> + <relatedItem> + <biblStruct> + <monogr> + <title xml:id="resource_title">ediarum + Stefan Dumont, Martin Fechner, Sascha + Grabsch + + + + + + + + 2018 + + + http://www.bbaw.de/telota/software/ediarum + + + + 2019-10-15 + + + +

Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +

+
+
+ + + + + + + cf. Catalogue 0.1.1 + + What type of software is it? + + + + + + + + + + + + + + cf. Catalogue 1.4 + + On which platform runs the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + For what purpose was the tool developed? + + + + + + + + + + + + + + cf. Catalogue 1.6 + + Which is the financial model of the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + What is the development stage of the tool? + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Which programming languages and technologies are + used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Does the tool reuse portions of other existing + software? + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are + used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are + used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which character encoding formats are supported? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Is a pre-processing conversion included? + + + + + + + cf. Catalogue 3.2 + + Does the documentation list dependencies on other software, + libraries or hardware? + + + + + + If yes, is the software handling the installation of + dependencies during the general installation process (you don't have + to install them manually before the installation)? + + + + + + + + + cf. Catalogue 3.4 + + Is documentation and/or a manual available? (tool website, + wiki, blog, documentation, or tutorial) + + + + + + + cf. Catalogue 3.3 + + Which format has the documentation? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + Which of the following sections does the documentation + contain? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + In what languages is the documentation available? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is there a method to get active support from the developer(s) + or from the community? + + + + + + + cf. Catalogue 3.4 + + Which form of support is offered? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is it possible to post bugs or issue using issue tracker + mechanisms? + + + + + + + cf. Catalogue 3.6 + + Grade how straightforward it is to build or install the tool on + a supported platform: + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.7 + + Is there a test suite, covering the core functionality in order + to check that the tool has been correctly built or + installed? + + + + + + + cf. Catalogue 3.8 + + On which platforms can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + On which devices can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: On which browsers can the + tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: Does the tool rely on browser + plugins? + + + + + + + cf. Catalogue 3.8 + + Is there an API for the tool? + + + + + + + cf. Catalogue 3.9 + + Is the source code open? + + + + + + + cf. Catalogue 3.9 + + Under what license is the tool released? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.9 + + Does the software make adequate acknowledgement and credit to + the project contributors? + + + + + + + cf. Catalogue 3.9 + + Is the tool/software registered in a software + repository? + + + + + + If yes, can you contribute to the software development via the + repository/development platform? + + + + + + + cf. Catalogue 3.10 + + Can the code be analyzed easily (is it structured, commented, + following standards)? + + + + + + + cf. Catalogue 3.10 + + Can the code be extended easily (because there are contribution + mechanisms, attribution for changes and backward + compatibility)? + + + + + + + cf. Catalogue 3.10 + + Can the code be reused easily in other contexts (because there + are appropriate interfaces and/or a modular architecture)? + + + + + + + cf. Catalogue 3.11 + + Does the software provide sufficient information about the + treatment of the data entered by the users? + + + + + + + cf. Catalogue 3.12 + + Is there information available whether the tool will be + supported currently and in the future? + + + + + + + cf. Catalogue 3.13 + + Does the tool supply citation guidelines (e.g. using the + Citation File Format)? + + + + + + + + + cf. Catalogue 4.1 + + What kind of users are expected? + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.1 + + What kind of user interactions are expected? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.2 and 0.1.1 + + What kind of interface does the tool provide? + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.3 + + Does the tool provide a particular visualizations (in terms of + analysis) of the input and/or the output data? + + + + + + + cf. Catalogue 4.4 + + Is the user allowed to customize the functioning of the tool + and the output configuration? + + + + + + + cf. Catalogue 4.5 + + Does the tool provide particular features for improving + accessibility, allowing „people with the widest range of + characteristics and capabilities" to use it? + + + + + + + + +
+ + +
ediarumDB, ediarum.BASE.edit and ediarum.REGISTER.edit are the + three currently released modules of the ediarum editing environment developed by the + TELOTA initiative at the BBAW in Berlin. The set of two frameworks for the Oxygen + XML Editor and one eXist-db application aims to support digital scholarly editors in + generating and annotating TEI-XML Data. The frameworks offer a graphical interface + within Oxygen XML Editor to add mark-up and metadata for a subset of TEI elements + without the need of manually editing XML files. The modules are open-source and + intended to be used as a starting point or toolbox for other developers to create + project-specific customized frameworks. Despite some parts of the modules only being + available in German as of this moment, and some need for improved documentation, the + ediarum modules already offer a flexible and open way for researchers to edit their + TEI data and a resource for other developers to build upon in their projects.
+
+
+ Introduction +

This review examines three tools: ediarumDB, + ediarum.BASE.edit and ediarum.REGISTER.edit, + which are part of the continuously developed "work and publication + environment for scholarly editing" ediarumhttps://web.archive.org/web/20190531085527/http://www.bbaw.de/telota/software/ediarum.developed + by the Berlin-Brandenburg Academy of Sciences and Humanities + (Berlin-Brandenburgische Akademie der Wissenschaften, BBAW). The first + release from the ediarum software environment was ediarum.JAR in December + 2013, a module which is now integrated into the three modules reviewed. The + modules discussed here were released throughout 2018. For this review I am + using ediarumDB 3.2.5., + ediarum.BASE.edit 1.1.1., and + ediarum.REGISTER.edit 1.0.1. Integrated into those modules + is ediarum.JAR 3.1.0. As indicated by the leading version + number, they can be considered as ‘ready for production’, although, as is + the nature of continuously developed software, are likely to be further + developed and improved in the future. Further modules (eg. for print and web + publication) of ediarum have been announced and are internally + already being used (Dumont and Fechner 2019) but have not yet been released + publicly and will therefore not be the subject of this review which will + limit itself to the three modules available at this time. These three tools + are also what I will be referring to as ediarum in this + article. For the purpose of this review, I used a MacOS 10 System with Java + Runtime Environment 8, Oxygen XML Editor version 21 and + eXist-db version 4.4.

+

Ediarum, at its core, is a set of extensions for two other + applications, namely the Oxygen XML Editor + http://web.archive.org/web/20190606184223/https://www.oxygenxml.com/. + and eXist-dbhttp://exist-db.org/exist/apps/homepage/news.xql (last accessed: + May 30, 2019).. These two applications are widely used + by projects in the context of digital scholarly editions and + ediarum aims to provide an environment optimized for + working with transcriptions and the markup of these transcriptions within + those tools. It allows the user to work with XML data, specifically TEI, by + using a graphical user interface without having to manually write XML in a + text editor. Ediarum.BASE.edit and + Ediarum.REGISTER.edit are so-called ‘frameworks' for the + Author mode of Oxygen. The Author mode is an alternative way to + display XML files that allows users to edit the documents in a way that is + much closer to WYSIWYG editor or a word processor. The two ‘edit' modules of + ediarum are customized to offer the most common + functionality an editor of TEI XML Data would need, such as the easy + generation of metadata fields, markup of structural text elements, textual + phenomena and entities. These are supplemented by the + ediarum.DB module, which uses eXist-db + database to provide a central repository for the XML data, and for central + indexes of entities that can then be directly looked up and referenced from + Oxygen. In addition to this, it allows the sharing of this + data via a REST API or via WebDAV. It should be noted that + ediarum is not intended and marketed as a plug-and-play + solution for any particular use-case, but rather a toolboxhttps://doi.org/10.5281/zenodo.2621061 (last accessed: May 30, + 2019). that can be adapted for the specific uses of any + one projectMy academic background is in Scholarly + Editing and Documentology but I have been working as a developer on + various DH and digital edition projects since. Therefore my perspective + on these tools is both that of someone who has experience in + transcribing and annotating documents as a humanities scholar and of + someone who is building an environment for other researchers to perform + these tasks. In my development work, I have been using parts of ediarum + for some years.

+

It is being developed by the TELOTA (The Electronic Life Of The Academy) + initiative at the BBAW in Berlin by lead developers Stefan Dumont, Martin + Fechner, and Sascha Grabsch. After being used for projects at the BBAW + internally for many years, modules are released successively to the public + via a github repositoryhttps://github.com/ediarum (last + accessed: May 30, 2019)., which includes all relevant + contact information, licenses, and acknowledgments.

+
+
+ Methodology and Implementation +

At a high level of methodological abstraction, one can ask what part of the + scientific, in this instance, the editorial process a tool attempts to + support the researcher. In the case of ediarum, it is the idea + of generating knowledge by explicating information present in source + documents. A source document, in this case, would, for example, be a letter. + A transcription of such a letter in its most rudimentary form has only + linguistic information, as it is only a string of text, if it were + transcribed line by line it would already have rudimentary structural + information. While this might be sufficient for projects with pure archival + or collection goal, the next step in the editorial workflow would be the + markup and therefore the explicit manifestation of structural information, + of known entities within the text and the addition of metadata. All this + with the aim of making the implicit structures and relations within a text + or a corpus of texts explicit and to allow further interpretation and + research to be done on the basis of this information. XML, specifically the + TEI standard has established itself as the most widely used technical + solution for this problem but using it to its full potential requires a + certain degree of technical familiarity and willingness to work with XML. + Ediarum attempts to bridge the gap between humanities + researchers and the possibilities of XML markup, to empower them to perform + the above-mentioned task of explicating and adding information to documents + without having to familiarize themselves with the technicalities of XML and + TEI.

+

Comparing ediarum in relation to other tools that tackle this + problem, it becomes evident that its solution is a unique one. While there + are many text annotation tools, the majority focuses on linguistic + annotation for Natural Language Processing (NLP) and Machine Learning. + Others like Annotation Studio. offers only very simplistic text annotation for + literary/archival texts in a web interface. The Classical Text Editorhttps://web.archive.org/web/20190612191653/http://cte.oeaw.ac.at/ + s. offers a vast array of possibilities for editors in + a related field, but is a closed commercial application with the option to + export TEI/XML. A tool that, on the surface, has some similarities is FuD, + the research environment developed by the University of Trierhttps://web.archive.org/web/20190613055623/https://fud.uni-trier.de/.. + FuD is a stand-alone piece of software which intentions far + exceed those of ediarum but it also includes a central + repository for data and indexes, metadata input on file creation and text + markup and annotation. As opposed to ediarum, FuD, + developed by a large team since 2004, is marketed as a fully featured + software solution for small to large projects. Individualized frameworks and + environments are developed on demand for larger customers. On a technical + level, the key difference here is the integration into the XML text editing + workflow. Ediarum is not an application that creates XML Data, + it is instead an additional layer on top of the actual data with manual text + editing possible at any point. It is based not on the paradigm of an + abstract input form that eventually outputs the final data format, but on + the idea of being an additional aid and tool for an editor to use while + working directly in the data. The aim of ediarum is to make the + editing workflow easier and more comfortable, especially for those + researchers, that do not have a background or an affinity to XML code. It + offers a cleaner way to working with the texts while at the same keeping the + entire depth of complexity of XML Data at close reach.

+

Looking at the work of developers in DH projects, instead of editors and + researchers, one can see another process that ediarum + potentially can have an impact on. Staying within the ‚toolbox‘ metaphor, + ediarum not only can be seen as a toolbox for editors but + also as a toolbox for other developers to be used in their projects. + Developers can use ediarum in its default configuration or with + only minor adjustments or they can use only select functionality and + integrate it into their own custom frameworks. Due to the open-source nature + of the project and the separate availability of ediarum. + JAR, which includes the core methods used in the other + three modules, both those type of reuse are actively encouraged. Therefore + the paradigm at play here is not that of software as a fully packaged + service or solution shipped to consumers, but as tools and smaller + applications to be integrated into individual workflows.

+

As mentioned before, the tools are extensions of the pre-existing software + and thus obviously have a dependency on both. While eXist-db is + fully open source, Oxygen XML Editor is a closed source + commercial application. It seems at first problematic to make an open-source + tool dependent on commercial software, especially since ediarum + itself explicitly supports the idea of it being adapted and modified by + other developers. In defense, the case could be made, that since + Oxygen is such a predominant and ubiquitous tool, its + availability can be seen as a given for any project or researchers within + the field. The benefit of depending and building on Oxygen is + the familiarity that many developers and researcher will have with it. This + close relation to both eXist-db and Oxygen also + determines the in- and output format for ediarum. XML is, of + course, the main format as both its base programs are designed for it and it + is the de facto standard for the relevant use-case. Any conversion + possibility is the domain of the base Oxygen software. An XML + file created with ediarum is per definition not different from + any XML file, 'written by hand' and thus all the possibilities of + conversion, character encoding, export and import functions into other + workflows are those of Oxygen and eXist-db + respectively.

+

In addition to that, the ediarum.DB application for + eXist-db offers the possibility of interfacing with the + indexes by means of a REST API, so any other web application can make use of + the indexes created with ediarum as they are being created. + Another possibility of interoperability is the interface that exists with + Zotero bibliographies which can be synchronized with the + indexes (Fig 1.) .

+
+ Fig. 1: The three ediarum modules in relation to their base software + and to each other. + +
+

In terms of versioning, logging, and performance, ediarum also + depends on the features of Oxygen and eXist-db. + The Author mode frameworks do not seem to impact the performance of the + Oxygen XML Editor at all compared to the text editor and + switching between the text and the author view is instantaneous. The only + functionality with a noticeable performance impact seems to be the opening + of very large indexes. In a test with 25.000 person names and ids, there was + a short delay of about two seconds noticeable before the list was displayed + and one could interact with it.

+

Due to its history as an internal tool of the BBAW, ediarum has + been employed in many of their academy projects such as: 'Schleiermacher in + Berlin 1808-1834. Briefwechsel, Tageskalender, Vorlesungen'https://schleiermacher-digital.de/index.xql (last accessed: June + 10, 2019)., 'Alexander von Humboldt auf Reisen. + Wissenschaft aus der Bewegung'. and the 'Marx-Engels-Gesamtausgabe'https://web.archive.org/web/20190612201339/http://megadigital. + bbaw.de/index.xql.to name a few. In those cases, other + modules e.g. for web publication have also been used that have, as of yet, + not been released to the public. Beyond those internal projects, it is + unclear if the three modules released in late 2018 have already seen use in + projects from other institutions but due to developer workshops taking place + in 2019 they will likely see further application. I have personally used + some of the functions of the ediarum. JAR modules + in my development of a custom framework even before the three more fully + developed modules were released.

+
+
+ An example usecase +

Let us assume the following use-case. A project that already includes some + letters and indexes of places or persons. The editor now wants to add a new + transcription to the project. They will use the Data Source Explorer within + Oxygen to create a new file select the ‚letter’ template + that comes with the default ediarum framework (in addition to a + template for introductory texts, and manuscripts). Before the file is + created, the editor can enter the name of the archive and collection and the + signature of the letter. This will then create an XML File with the basic + elements like the teiHeader and body elements like ‚opener’, ‚salute’ etc. + already in place (Fig 2.).

+
+ Fig. 2: The empty template of a letter. + +
+

Further metadata can be added through the menus ‚Metadaten‘ and + ‚Briefmetadaten‘ in the toolbar directly above the editor window. These + correspond to the ‚fileDesc‘ and ‚profileDesc‘ information of the teiHeader. + For simple metadata fields, these function will prompt pop-up windows + prompting the editor to enter the relevant data which will then be written + into the file in the appropriate TEI Element in the correct position. For + some fields like e.g. the name of the author of the letter this will trigger + a call to the person index in the ediarum.DB. application which will be + displayed as a list of names. Selecting one, will insert both the name and + the corresponding xml:id into the document.

+

The text of the letter can be inserted into the pre-existing structural + elements. Once the raw text has been copied into the file, the mark-up + functions can be used. These include different types of deletions and + additions, types of emphasis and comments. The indexes can also be used here + to markup entities in the text, by selecting them in the editor and then + selecting the corresponding entry from the index (Fig 3.).

+
+ Fig. 3: Different kinds of markup can be seen in the letter with the + selection window for the person index open. + +
+

These indexes can also be edited from within Oxygen through the + ediarum.REGISTER.edit framework which operates analogously + to this example. In this way, a new person can be added to the index of + persons and immediately be referenced in a letter without having to leave + the graphical interface or application.

+
+
+ Deployment and learning curve +

The modules are available on individual public gitHub repositories. The + ediarum.DB repository includes a documentation that + explains the installations process, the integration into + eXist-db as well as the GUI and how to implement advanced + functionality. Assuming prior experience with the basic functionalities of + eXist-db, the setup of ediarum.DB is very + straightforward. The documentation sections of ediarum.BASE.edit + and ediarum.REGISTER.edit only offer a ‚tba’ + placeholder as of the time of this review. Therefore, while the setup + process of ediarum.DB is sufficiently documented, the specifics + of how the frameworks have to be integrated into the local + Oxygen installation, how to connect and interface with the + database are not documentedA module called + ediarum.GUIDE has been announced that will hopefully fill this gap. . A + developer attempting to set-up or customize ediarum.BASE.edit + and REGISTER.edit will need to have solid knowledge of the + Oxygen Author mode and CSS to be able to adapt the + frameworks for a specific use-case. By means of github technical support can + easily be found by using the issue system. The developers seem to respond + quickly on these channels and also offer an additional mailing list.

+

Looking at the learning curve of the tools, let’s consider first the + developer, who wants to install and/or customize ediarum for + their project. The initial set-up of the database portion is straight + forward thanks to the documentation. The eXist-db module in the + form of a ‘.xar’ file is ready for use after an import in the local + eXist-db installation.The user interface of the existDB + application is clear and the initialization of indexes, the set-up of user + accounts and other administration tasks can be learned quickly. On the side + of Oxygen, some knowledge of the framework system and + specifically the Author mode view is necessary when customizing the + ediarum.BASE and REGISTER framework for a + specific project. The requirements to the developer scale with the depth of + the desired customizations One example ‘letter' that exhibits various type + of mark-up and metadata is shown the implementation of these functions. This + can be used as a starting point for testing and customization, and combined + with a study of the source code will give developer insight into how they + can add and modify the framework for their purposes.

+

From the point of view of the ‘end user’, the non-DH researcher and editor, + navigation and use of the framework, once it is installed, is largely + self-explanatory and intuitive. All specific additional functionality is + available through the Oxygen GUI, primarily a custom toolbar + through which the various functions are available. Even a user with no + experience in using Oxygen will quickly find themselves at + home, as similar toolbars and icons are found in any office suite.

+

Due to the nature of both Oxygen and eXist-db as + JAVA-based and cross-platform applications, the user experience is identical + across Windows, MacOS and Linux. Any system supported by the Oxygen + XML Editor will be able to run ediarum(It is + optimized for Oxygen XML Editor version 20.1.). Any instance of + eXist-db, independent from the system on which it is + installed on, can install the ediarum.DB module + (eXist-db version 3.5 and 4.4. are explicitly supported). + Interoperability with external applications is provided via the REST API of + eXist-db through which the indexes can be accessed. There + is also a specific API connection to Zotero provided which will keep indexes + in synchronization with Zotero bibliographies.

+
+
+ Open Source and extensibility +

The source code is openly available on githubhttps://github.com/ediarum (last + accessed: May 30, 2019). and is distributed under and + the terms of the GNU General Public License 3. (or > later) although it makes use of some third-party + software that is separately licensed under different terms, such as + Bootstrap (MIT License) or the 'tei.jar' module and licenses under the + 3-clause BSD licenses.. This + is clearly and properly stated in the readme files /> as well as license + files included in the repository. In addition to GitHub, + ediarum.BASE.edit and ediarum.DB are also + available on the research repository Zenodo using their system of software + citation. and . This + ensures > that specific versions of the tools have unique DOI and thus + can be referred to and cited precisely and equivalently to written research + contributions.

+

Ediarum is explicitly intended to be extensible and reusable. + Even through the use of the eXist-db GUI, as well as the + Oxygen Author mode menus, significant customization can be + achieved (e.g. setting up new indexes of entities and allowing reference to + these in Oxygen, allowing the addition of arbitrary XML + fragments into documents via buttons or short cuts). In case deeper + customization is desired, the folder structure and files in the source code + are clearly structured and named, a random sampling of files also revealed + clear function names and/or commenting. This extends to projects not using + eXist-db or even projects not using TEI. The ediarum.JAR + module, which is integrated into the two framework modules, is also + available on its own. It extends the default Oxygen Author mode actions with + function related to inserting data from indexes. These actions can be used + in any framework build for the Author mode.

+

As the code is complex, it will still require some studying and analysis but + a developer fluent in the relevant languages (X-technologies, CSS, + potentially also JAVA) will know where to inject custom functionalities and + how to integrate it with existing functions. Again, it has to be noted that + the developers are continually working on the modules themselves and github + presents a platform for anyone wishing to extend and adapt the code. Judging + by recent presentations (Dumont and Fechner 2019), the three modules that + have so far been the focus of this review will be accompanied in the future + by further modules that will provide functionality related to publication in + web and print.

+
+
+ User interaction & GUI +

Both Author mode frameworks employ the GUI of the Oxygen XML + Editor, specifically, they provide their functionality by using + menus and toolbars. The main toolbar (see Fig.4) contains both icons and + text, most of which intuitively represent the functionality they provide. + Using one of these buttons often opens a pop-menu with more detailed + options. (e.g. the function to mark up a 'deletion' in the text will allow + the user to specify if the deletion was by eraser, by strikethrough or by + overwriting). These will then create the corresponding TEI Element and/or + attributes.

+
+ Fig. 4: The two toolbars of the ediarum.BASE.edit module integrate + into the regular Oxygen toolbars. The functions are also accessible + through the ‘ediarum’ menu in the top menu bar. + +
+

In the current version, all text elements of the GUI are exclusively German. + Due to the origin of ediarum as an internal tool at the BBAW + that is only now successively being made public, this is understandable. + Nevertheless, this limits the possible use and reach of these tools, as + within the Oxygen Framework only the part of the functionality + is are visualized by icons. Many deeper markup and editing functions are + only described in German text.

+
+ Fig. 5: The purely text-based administrative ‘dashboard’ of + ediarum.DB. + +
+

The module ediarum.DB uses a custom GUI based on a simple and + clear bootstrap framework. This GUI is exclusively text-based and allows + easy and structured access to the administrative functions of the database + module (Fig 5.). This text, also, is in German. This would require an + international project deciding to use ediarum to dedicate a + certain effort to translate those text-elements of the GUI both in the + eXist-db and Oxygen portions of the tools. +

+
+
+ Conclusion +

Recalling the stated intention of ediarum to be a ‚toolbox‘ for + scholarly editors and their project teams, one must acknowledge that, + despite its only partial release, these tools still manage to go very far in + providing just that. Projects dealing with the transcription of corpora of + text or letters that want to make use of the full depth of XML markup will + find ediarum empowering. Humanities scholars and researchers in + those projects will be able to add rich markup to their data without having + to get acquainted with XML on a technical level. Developers working with + those scholars will find a starting point in the development of an + appropriate custom framework and, if eXist-db is employed for + the backend, even a rudimentary solution to allow collaborative work, data + storage, backups, etc. Depending on how close these projects fall in line + with the default configuration of ediarum, i.e. the specific + sub-schema of TEI used at the BBAW, the predefined types of indexes this + adaptation process will take varying amounts of effort and time, but will + any case be preferable to starting from scratch. Even projects far removed + from the specific TEI use-case can exploit some functionality from + ediarum, as the core functions offered by the core + ediarum.JAR module will work for any XML data standard.

+

While forms for data input and even basic transcription/markup tools are + nothing new, the key advantage and contribution is the fact this interface + is integrated as a layer on top of Oxygen XML Editor, which is, + of course, an extremely powerful tool. This allows for and encourages and + layered or hybrid workflow. One can use the Author mode frameworks for + easier work with the text itself, for access to indexes, for shortcut + actions that can inject arbitrarily large and complex XML fragments into the + document with one mouse click while at the same time, having the actual XML + code and all the power of XPath, XSLT, XQuery etc. at hand without having to + switch to another tool or the need to export the data into another format. + This is an approach to software that seems tailor-made for digital + humanities projects as they often include researchers that possess varying + degrees of technical knowledge and have varying need for complexity all + working on the same dataset.

+

What is holding back ediarum in its current state, is the lack + of an English language interface for the default configuration and the lack + of documentation for the two main modules. Non-German language projects + either have to restrict themselves to using only the core JAVA functions + provided by the ediarum.JAR modules or, if they choose to adopt + the more developed frameworks, will have to provide their own translations. + Since the developers are transparent about the fact that these modules do + only form part of a larger system, and their integration into the long-term + TELOTA initiative, one can be certain that ediarum will be developed further + and these drawbacks will likely be addressed in future releases, some of + which have already been hinted at, like ediarum.GUIDE, or + ediarum.WEB. Such further development would make this + ‘toolbox’ even more versatile and accessible.

+
+
+
+ + Dumont, Stefan, and Martin Fechner. 2015. “Bridging the Gap: Greater + Usability for TEI encoding”, Journal of the Text Encoding Initiative + [Online], Issue 8 | December 2014 - December 2015, Online since + 09 June 2015, connection on 31 May 2019. URL : ; DOI : + 10.4000/jtei.1242 + Dumont, Stefan, and Martin Fechner. 2019. “ediarum – Arbeits- und + Publikationsumgebung für digitale Editionsvorhaben” last modified April 2. + Zenodo. + Dumont, Stefan, Sascha Grabsch, and Martin Fechner. 2019, March 19. + ediarum/ediarum.BASE.edit: ediarum.BASE.edit 1.1.1 (Version + v1.1.1). Zenodo. + + Fechner, Martin. 2019, March 15. ediarum/ediarum.DB: ediarum.DB + 3.2.5 (Version v3.2.5). Zenodo. + +
+ +
+
diff --git a/ediarum/pictures/picture-1.png b/ediarum/pictures/picture-1.png new file mode 100644 index 0000000..876bd02 Binary files /dev/null and b/ediarum/pictures/picture-1.png differ diff --git a/ediarum/pictures/picture-2.png b/ediarum/pictures/picture-2.png new file mode 100644 index 0000000..4e259b1 Binary files /dev/null and b/ediarum/pictures/picture-2.png differ diff --git a/ediarum/pictures/picture-3.png b/ediarum/pictures/picture-3.png new file mode 100644 index 0000000..8291241 Binary files /dev/null and b/ediarum/pictures/picture-3.png differ diff --git a/ediarum/pictures/picture-4.png b/ediarum/pictures/picture-4.png new file mode 100644 index 0000000..da40cd5 Binary files /dev/null and b/ediarum/pictures/picture-4.png differ diff --git a/ediarum/pictures/picture-5.png b/ediarum/pictures/picture-5.png new file mode 100644 index 0000000..0c72c3b Binary files /dev/null and b/ediarum/pictures/picture-5.png differ diff --git a/omeka/omeka-tei.xml b/omeka/omeka-tei.xml new file mode 100644 index 0000000..a6f47b7 --- /dev/null +++ b/omeka/omeka-tei.xml @@ -0,0 +1,1998 @@ + + + + + + Omeka Classic. Un environnement de recherche pour les éditions scientifiques numériques + + author + + + Elina + Leblanc + + + Université Grenoble Alpes + Grenoble + + elina.leblanc@univ-grenoble-alpes.fr + + + + + Institut für Dokumentologie und Editorik + 2019-01-18 + https://ride.i-d-e.de/issue-11/ + https://ride.i-d-e.de/issue-11/omeka/ + 10.18716/ride.a.11.3 + + + + + + + + + + + Omeka Classic + Elina Leblanc + + + + + + + + 2019 + + + https://omeka.org/classic/ + + + + 2019-12-18 + + + +

Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +

+
+
+ + + + + + + cf. Catalogue 0.1.1 + + What type of software is it? + + + + + + + + + + + + + + cf. Catalogue 1.4 + + On which platform runs the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + For what purpose was the tool developed? + + + + + + + + + + + + + + cf. Catalogue 1.6 + + Which is the financial model of the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + What is the development stage of the tool? + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Which programming languages and technologies are + used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Does the tool reuse portions of other existing + software? + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are + used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are + used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which character encoding formats are supported? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Is a pre-processing conversion included? + + + + + + + cf. Catalogue 3.2 + + Does the documentation list dependencies on other software, + libraries or hardware? + + + + + + If yes, is the software handling the installation of + dependencies during the general installation process (you don't have + to install them manually before the installation)? + + + + + + + + + cf. Catalogue 3.4 + + Is documentation and/or a manual available? (tool website, + wiki, blog, documentation, or tutorial) + + + + + + + cf. Catalogue 3.3 + + Which format has the documentation? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + Which of the following sections does the documentation + contain? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + In what languages is the documentation available? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is there a method to get active support from the developer(s) + or from the community? + + + + + + + cf. Catalogue 3.4 + + Which form of support is offered? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is it possible to post bugs or issue using issue tracker + mechanisms? + + + + + + + cf. Catalogue 3.6 + + Grade how straightforward it is to build or install the tool on + a supported platform: + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.7 + + Is there a test suite, covering the core functionality in order + to check that the tool has been correctly built or + installed? + + + + + + + cf. Catalogue 3.8 + + On which platforms can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + On which devices can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: On which browsers can the + tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: Does the tool rely on browser + plugins? + + + + + + + cf. Catalogue 3.8 + + Is there an API for the tool? + + + + + + + cf. Catalogue 3.9 + + Is the source code open? + + + + + + + cf. Catalogue 3.9 + + Under what license is the tool released? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.9 + + Does the software make adequate acknowledgement and credit to + the project contributors? + + + + + + + cf. Catalogue 3.9 + + Is the tool/software registered in a software + repository? + + + + + + If yes, can you contribute to the software development via the + repository/development platform? + + + + + + + cf. Catalogue 3.10 + + Can the code be analyzed easily (is it structured, commented, + following standards)? + + + + + + + cf. Catalogue 3.10 + + Can the code be extended easily (because there are contribution + mechanisms, attribution for changes and backward + compatibility)? + + + + + + + cf. Catalogue 3.10 + + Can the code be reused easily in other contexts (because there + are appropriate interfaces and/or a modular architecture)? + + + + + + + cf. Catalogue 3.11 + + Does the software provide sufficient information about the + treatment of the data entered by the users? + + + + + + + cf. Catalogue 3.12 + + Is there information available whether the tool will be + supported currently and in the future? + + + + + + + cf. Catalogue 3.13 + + Does the tool supply citation guidelines (e.g. using the + Citation File Format)? + + + + + + + + + cf. Catalogue 4.1 + + What kind of users are expected? + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.1 + + What kind of user interactions are expected? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.2 and 0.1.1 + + What kind of interface does the tool provide? + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.3 + + Does the tool provide a particular visualizations (in terms of + analysis) of the input and/or the output data? + + + + + + + cf. Catalogue 4.4 + + Is the user allowed to customize the functioning of the tool + and the output configuration? + + + + + + + cf. Catalogue 4.5 + + Does the tool provide particular features for improving + accessibility, allowing „people with the widest range of + characteristics and capabilities" to use it? + + + + + + + + +
+ + +
This review focuses on Omeka, an open-source Content + Management System (CMS), which has been specifically designed for the management and + the display of digitized historical content. Originally, this CMS was not intended + for the creation and display of scholarly digital editions. However, the active + community of Omeka’s users has developed several plugins that can + manage and display digital scholarly editions following the XML-TEI standard. This + review will then present several of these plugins, and the different ways we can use + them to include Omeka in a digital editing process.
+
+
+ Introduction +

Le CMS (Content Management System) open-source OmekaOmeka Classic (v.2.7, licence GPL3): (Dernières révisions le 29 mai 2019). est né en 2008, à + l’initiative du Roy Rosenzweig Center for History and New + Media(RRCHNM)Le RRCHNM est également à + l’origine du développement de Zotero, un outil de gestion + bibliographique, et de Tropy, une plateforme de gestion et + d’annotations de photographies. de l’Université George Mason en + VirginieL’équipe de développement d’Omeka est + composé de Alyssa Toby Fahringer, Jeremy Boggs, Jim Safley, John + Flatness, Ken Albers, Kim Nguyen, Megan Brett, Patrick Murray-John, + Sharon Leon, Sheila Brennan et Tom Scheinfeldt: . Entièrement tourné vers les institutions patrimoniales + (bibliothèques, archives, musées, etc.), qui représentent son public + privilégié, il a été pensé pour répondre aux besoins de celles-ci en termes + de gestion, de valorisation et de diffusion d’importantes collections + numérisées. Face à la multiplication des sites Omeka au sein + d’une même institution et face aux besoins grandissants de la communauté, un + nouveau logiciel, baptisé Omeka SOmeka S (v.2.0.2) (Dernières révisions le 16 août 2019)., apparaît en 2017. Ce + nouveau CMS ne vient pas remplacer l’ancien Omeka, qui + s’appelle désormais Omeka Classic, mais le complète, en + permettant notamment de gérer plusieurs sites à partir d’une seule instance + du logiciel ou d’incorporer des vocabulaires issus du Web sémantique. Les + deux outils partagent ainsi le même nom, mais reposent sur des logiques + différentes. Cette revue se focalise toutefois uniquement sur Omeka + Classic, et non sur Omeka S, dans la mesure où ce + dernier n’a pas encore fait l’objet de développement pour la diffusion + d’éditions numériques scientifiques, contrairement à Omeka + ClassicPar commodité de langage, dans + la suite de cet article, nous désignerons Omeka Classic + sous sa forme abrégée Omeka. comme nous le verrons ci-après.

+

Pour comprendre le fonctionnement d’Omeka et son rapport à + l’édition numérique, compris ici au sens anglais d’editing, il + convient de faire une distinction entre les éditions numérisées et les + éditions numériques, pour reprendre la terminologie de Patrick Sahle (2008; + 2016). Une édition numérisée (digitized edition) est la + numérisation d’une édition imprimée et constitue en cela une remédiation + d’un contenu analogique. Une édition numérique (digital + edition), quant à elle, est une représentation critique d’une source + physique (manuscrits, imprimés, etc.), suite à un processus d’édition qui va + au-delà du processus de numérisation (Sahle 2008; Pierazzo 2015, 22; Sahle + 2016, 26).

+

Si nous appliquons cette distinction à Omeka, nous constatons + que cet outil n’a pas été pensé, à l’origine, pour la réalisation d’éditions + numériques, mais pour la diffusion et la valorisation d’éditions imprimées + numérisées, en vue de créer des bibliothèques ou des archives numériques. + Lors de son installation, Omeka propose ainsi un noyau brut de + fonctionnalités, qui se limite principalement à l’ajout de contenus + numérisés et de métadonnées, à la recherche dans ces mêmes contenus ou + encore à la création d’expositions virtuelles.

+

Cependant, face aux besoins croissants de la communauté et l’explosion du + nombre de projets Omeka de plus en plus variés au cours de la + dernière décennie, ce CMS a été repensé pour pouvoir être intégré dans un + processus d’édition numérique, notamment en se concentrant sur certaines + étapes de ce processus, telles que la transcription, l’encodage ou la + publication. Ces différentes étapes prennent la forme d’extensions (ou + plugins), maintenues par l’équipe d’Omeka (plugins officiels) + ou par sa communauté (plugins non-officiels)Omeka + propose plus de 200 plugins. La liste des plugins officiels d’Omeka se + situe sur le site du logiciel: Les plugins développés par la communauté Omeka sont + disponibles sur la plateforme GitHub. Parmi les plus + importants contributeurs, nous pouvons citer l’équipe Scholar’s Lab de + l’Université de Virginie (), l’équipe BibLibre () ou encore Daniel Berthereau (). et qui viennent enrichir le noyau principal de ce CMS.

+

Dans cette revue, nous proposons donc une présentation de la manière dont ces + différentes activités peuvent être réalisées avec Omeka, afin + de mettre en avant plusieurs solutions pour intégrer ce CMS dans un + processus d’édition numérique.

+
+
+ Un CMS, plusieurs plugins, plusieurs perceptions du processus d’ édition + numerique +
+ Omeka ou la tradition des CMS patrimoniaux +

+ Omeka appartient également au domaine des CMS patrimoniaux, + dans la mesure où il a été conçu pour la diffusion de ressources + numériques patrimoniales de nature très différente (texte, image, + documents audiovisuels), qu’il organise en collections hiérarchisées à + l’image des fonds d’archives. Omeka propose un juste + équilibre entre une gestion archivistique des collections et leur + valorisation et diffusion auprès d’un large public, à l’aide d’une + double interface (interface privée/interface publique) qui permet de + publier en temps réel des contenus. Cette organisation et la facilité de + son installation et de son utilisation valent à Omeka une + grande popularité auprès des institutions patrimoniales qui souhaitent + mettre en ligne leurs collections numérisées.

+

La diversité des institutions utilisant Omeka a fait émerger + des besoins nouveaux, pour lesquels Omeka n’avait pas été + pensé au départ, à l’exemple de la publication d’éditions numériques, et + non plus uniquement d’éditions numérisées. La communauté des + utilisateurs d’Omeka, notamment celle issue des instituts + et des laboratoires de recherche, s’est récemment emparée de cette + question et a proposé plusieurs solutions, sous la forme de plugins, + permettant d’intégrer Omeka à certaines étapes du processus + d’éditions numériques.

+

Parmi ces étapes, la plus populaire et la plus répandue au sein de la + communauté Omeka est la transcription, représentée par le + plugin Scripto. + . Cet outil, également disponible sur d’autres CMSScripto est également disponible pour les CMS + WordPress et Drupal., permet aux utilisateurs de transcrire + de manière coopérative des éditions imprimées numérisées. + Scripto s’inscrit ici dans la lignée des projets de + transcription participative tels que Transcribe + Bentham + ., What’s on the Menu. ou encore Europeana Transcribe . qui invitent leurs utilisateurs à transcrire leurs + collections sur la base du volontariat.

+

Au cours de la dernière décennie, plusieurs entreprises ont vu le jour + pour faire d’Omeka un environnement adapté à l’encodage et + la diffusion d’éditions numériques en XML-TEI (Text Encoding + Initiative). Actuellement, il n’existe aucun plugin officiel + ni aucun workflow unique pour encoder et afficher de telles éditions + dans Omeka. Les solutions disponibles émanent de projets + spécifiques. Chaque projet a recours à une solution différente et + développe à cette fin des plugins qui lui sont propres, au point que + nous pourrions dire qu’ils existent autant de plugins dédiés à la TEI + que de projets. Aucun de ces plugins n’a fait l’objet d’un retraitement + par les développeurs d’Omeka pour être intégré dans la + liste officielle des plugins.

+

Nous avons repéré cinq plugins permettant de gérer la TEI dans + Omeka: MLA TEI, TEI + Annotations, TEI Display, TEI Editions et + Transcript. + . Parmi ces plugins, nous distinguons deux groupes: ceux + qui permettent uniquement de publier des fichiers TEI et ceux qui + permettent de transcrire, d’encoder et de publier les éditions + produites. Dans cette revue, nous ne présenterons pas l’ensemble de ces + plugins, mais uniquement ceux pour lesquels nous possédons des exemples + d’applications concrètes dans des projets, à savoir TEI + Display, TEI Editions et + Transcript.

+

Bien qu’Omeka soit un outil très populaire, nous sommes + confrontés à une pénurie de publications scientifiques à son sujet, et + d’autant plus en ce qui concerne son emploi pour la réalisation + d’éditions numériques. Par conséquent, dans la suite de cette revue, + nous nous sommes appuyés sur la documentation technique produite par les + projets eux-mêmes pour comprendre leur utilisation d’Omeka + dans le cadre d’un processus d’édition scientifique.

+
+
+ Comment ça marche? Omeka et le processus d’éditions + numériques +

Les plugins développés pour Omeka doivent s’inscrire dans un + circuit de données préexistant, qui repose sur le duo Collections/Items. + En effet, dans Omeka, chaque contenu correspond à un item + et peut être classé dans une ou plusieurs collections. Ces items sont + décrits avec un ensemble de métadonnées au format Dublin + CoreCréé à l'initiative de l'OCLC + (Online Computer Library Center) en 1995, le Dublin Core est une + norme internationale, qui permet de décrire des objets numériques ou + physiques d'horizon différents. Il se compose de quinze balises + répétables et assez larges pour couvrir un grand nombre d'usages: + ., importées via un fichier CSV ou moissonnées via le + protocole OAI-PMHOAI-PMH (Open Archive + Initiative – Protocol for Metadata Harvesting) assure + l’interopérabilité des données entre plusieurs répertoires et permet + à ces derniers de moissonner les métadonnées de l’autre: .. Ces items peuvent être enrichis par divers médias, tels + que des images numérisées ou des fichiers audiovisuels.

+

Les différents plugins que nous allons présenter ont chacun adopté une + solution différente pour insérer des éditions numériques dans ce circuit + adapté aux éditions numérisées. Ils considèrent les éditions numériques + soit comme des extensions d’un item préexistant (Scripto, + Transcript, TEI Display), soit comme un + item à part entière (TEI Editions). Nous avons catégorisé + ces plugins en fonction des différentes étapes du processus d’édition + numérique auxquelles ils se référent et permettant d’intégrer + Omeka de plusieurs manières dans ce processus.

+
+ Omeka comme plateforme de transcription +

Le plugin Scripto permet de transformer + Omeka en une plateforme de transcription + participative. Développé par le RRCHNMScripto est développé par une équipe du RRCHNM composée de + Sharon M. Leon, Jim Safley, Ken Albers, Kim Nguyen, James + Halabuk et Lee Ann Ghajar: . et basé sur MediaWiki., cet outil open-source donne la + possibilité aux utilisateurs, après s’être connectés, de transcrire + les collections d’un projet, d’ouvrir des discussions avec d’autres + utilisateurs, de consulter l’historique de leurs transcriptions et + également de suivre l’évolution des transcriptions des contenus qui + les intéressent. De leur côté, les responsables scientifiques du + projet peuvent consulter et éditer les transcriptions, afin de + s’assurer de la qualité des productions des utilisateurs, mais + également les valider.

+

+ Scripto a été utilisé dans le cadre de nombreux projets + à l’exemple de Transcrire ., un projet dédié à la transcription de carnets + d’ethnologues ; DIY History., qui propose à la transcription de nombreuses + collections de la bibliothèque numérique de l’Université de l’Iowa ; + ou encore The Civil War in Letters + ., qui se concentre sur les lettres et manuscrits + produits pendant la Guerre civile américaine. Les transcriptions + produites par ces différents projets ont toutes pour objectif de + faciliter l’accès aux contenus, que ce soit en termes de + consultation, de recherche, de réutilisation et d’édition des + données.

+
+ Fig. 1: Interface de transcription de Transcrire () . + +
+

Ces projets reposent sur le même fonctionnement: l’utilisateur crée + un compte, sélectionne un contenu à transcrire et accède à une + interface de transcription qui met en vis-à-vis la numérisation d’un + contenu imprimé ou manuscrit et un champ de texte libre, où + l’utilisateur reproduit le texte tel qu’il le voit sur la + numérisation (Figures 1 et 2). Contrairement à d’autres projets, + tels que Transcribe Bentham ou Europeana + Transcribe, qui demandent à l’utilisateur de formater le + texte et de représenter certaines de ses particularités + (soulignement, position sur la page, etc.), les projets de + transcription appuyés sur Omeka n’offrent pas cette + possibilité. L’activité de transcription est ici réduite à son + minimum.

+
+ Fig. 2: Interface de transcription de DIY History (). + +
+

Bien que Scripto prévoit un système de validation par + les responsables scientifiques d’un projet, depuis l’interface + privée de l’outil, les projets de transcription participative + actuellement existants sous Omeka ont opté pour une + validation par les autres utilisateurs après une relecture.

+
+
+ Omeka comme plateforme de publication d’éditions + numériques +

Une autre manière d’employer Omeka dans le processus + d’édition numérique est de le considérer comme une plateforme de + publication. Il s’agit ici d’importer des fichiers XML TEI et de les + transformer pour l’affichage en ligne, via des feuilles de + transformation XSLT. C’est ce que propose le plugin TEI + Display développé par l’équipe Scholar’s + Lab + .: il permet d’attacher un fichier XML-TEI à un item + préexistant dans Omeka et de présenter son contenu en + ligne.

+
+ Fig. 3: Une lettre numérisée et sa transcription (). + +
+

Ce plugin a notamment été employé par le projet Roman des + MorandRoman des Morand (un + projet de l’ENS de Lyon, en partenariat avec les Archives + municipales de Lyon et le Laboratoire Triangle – UMR 5206): ., dédié à l’importante correspondance de la famille + Morand de Jouffrey, une famille de notables lyonnais. L’ensemble des + 338 lettres qui composent ce corpus a été numérisé, puis a fait + l’objet d’une transcription interprétative et partielle, + l’orthographe ayant été modernisée et certains passages avec un + faible intérêt historique et scientifique ayant été omis. Les + éditions numériques sont affichées en ligne avec le plugin TEI + Display.

+

Les transcriptions interprétatives des lettres se trouvent en-dessous + des numérisations et apparaissent comme un appui à l’édition + numérisée (Fig. 3). Chaque transcription peut être exportée en + XML-TEI. Le projet met également à disposition, en téléchargement, + le corpus entier encodé en XML-TEI, ainsi que le schéma utilisé pour + les besoins du projet sous sa forme ODD (One Document Does it + all)Lien vers l’ODD du projet Roman + des Morand: .. Le projet accordant une attention particulière à + l’onomastique, les noms propres cités dans les lettres sont mis en + évidence et renvoient à un index des noms de personnes. Pour + enrichir la compréhension et l’analyse des lettres, le projet s’est + appuyé sur les nombreuses fonctionnalités offertes par + Omeka, telles que les expositions virtuelles ou les + frises chronologiques.

+

Le plugin TEI Display a également été utilisé par le + projet d’édition numérique Civil War Governors of + Kentucky + ., font le corpus propose un panorama des réseaux et + des relations qui existaient autour du bureau du gouverneur pendant + la Guerre civile américaine. Ici, les développeurs ont adapté + TEI Display, en le combinant avec le plugin + Dropbox + . d’Omeka, afin de réaliser des imports en + masse.

+
+ Fig. 4: Transcription d’un placard datant de 1863 () . + +
+

Ce projet propose en vis-à-vis l’édition numérisée et l’édition + numérique interprétative (Fig. 4). Les noms de lieux, de personnes + et les dates sont mis en évidence et renvoient vers une définition + plus précise, qui elle-même propose une liste de tous les documents + où la notion définie est citée. Le fichier .pdf de l’édition + numérisée et le fichier .xml de l’édition numérique en TEI peuvent + être téléchargés par les utilisateurs. Comme pour le projet + Roman des Morands, l’édition numérique bénéficie + des fonctionnalités d’Omeka pour enrichir ses contenus, + à l’exemple d’expositions virtuelles, d’index et de dossiers + documentaires à destination des enseignants afin de réutiliser les + collections dans le cadre de cours.

+

Le plugin TEI Edition, développé par Mike Bryant dans le + cadre du projet EHRI (European Holocaust Research + Infrastructure) + ., permet, à l’inverse de TEI Display, de + créer de nouvelles notices de contenus, c’est-à-dire de nouveaux + items, à partir de fichiers XML-TEI. Ce plugin effectue en effet des + correspondances, à l’aide de la technologie XPath, entre des + éléments du + <teiHeader> et des éléments Dublin Core + sur lesquels Omeka repose (Tableau 1). Il permet + également d’extraire et d’afficher en ligne une transcription + contenue dans l’élément TEI <body>, en établissant une + correspondance avec le champ «Text» d’OmekaOmeka Classic propose de classer les contenus selon des + types d’items (Item types). À chacun de ces types, est associé + un ensemble de métadonnées, issu du Dublin Core, mais également + créé par Omeka, à l’exemple de l’élément Text, permettant + d’afficher toutes données textuelles contenues dans un document: + . À ces nouveaux items peuvent ensuite être associés + d’autres types de fichiers (.jpg, .png, .pdf, etc.), telles que des + éditions numérisées, afin de les enrichir (Bryant 2019). Tableau 1: + Exemples de correspondances entre la TEI et le Dublin Core, + proposées par le plugin TeiEdition.Tableau réalisé d’après . Éléments TEI Champs Dublin Core + tei:idno dc:identifier tei:title dc:title tei:item dc:subject + tei:abstract dc:description tei:publisher dc:publisher tei:licence + dc:rights tei:physDesc dc:format

+
+ Fig. 5: Édition numérique d’un télégramme (). + +
+

À ce jour, nous ne connaissons qu’un projet ayant recours au plugin + TEI Editions, à savoir l’édition numérique + BeGrenzte Flucht., qui propose l’édition de documents relatifs aux + réfugiés juifs autrichiens en Tchécoslovaquie au cours de l’année + 1938 + .. Ce projet fait partie du projet plus large EHRI, qui + a fait naître le plugin TEI Editions. L’édition + proposée est une édition interprétative, disponible au + téléchargement, qui met l’accent sur les entités nommées (Fig. 5). + Ces dernières sont cliquables et peuvent être recherchées dans + l’ensemble du corpus de l’édition, dans GeoNames. ou Wikipédia + .(Encadré à droite sur la figure 5). Chaque item est, + qui plus est, enrichi avec une carte géographique et des références + complémentaires, disponibles sur le portail du projet EHRI.

+

Les différents projets d’éditions numériques que nous avons présentés + dans cette partie se caractérisent tous par la présence de nombreux + outils complémentaires, tels que des index, des bibliographies ou + des expositions virtuelles. Ces outils mettent en lumière l’un des + principaux intérêts d’Omeka pour la diffusion + d’éditions numériques, à savoir l’enrichissement et la valorisation + des données contenues dans les éditions. Ces outils complémentaires + sont considérés par Roberto Rosseli Del Turco comme l’un des + principaux pré-requis des éditions numériques, en tant qu’interface + utilisateur (2011, paragr. 29). Ils accompagnent l’utilisateur dans + sa navigation et lui permettent d’accéder aux contenus de l’édition + sous différents angles, enrichissant ainsi sa découverte de + l’édition et ses connaissances.

+

La solution adoptée par les plugins TEI Display et + TEI Edition, pour diffuser des éditions numériques + dans Omeka, suit les recommandations des développeurs + du CMS eux-mêmes. En effet, face à la multiplication des demandes et + des initiatives envers les éditions numériques de la part de la + communauté, les concepteurs du CMS se sont prononcés en incitant les + utilisateurs à recourir à l’outil TEI BoilerPlate. + Cette recommandation positionne alors Omeka comme une + plateforme de diffusion d’éditions numériques plutôt que de + création.

+
+
+ De la transcription à l’encodage: la création d’éditions + numériques depuis Omeka + +

Bien que les recommandations des développeurs d’Omeka + soient de privilégier ce dernier comme une plateforme de diffusion, + un plugin, développé par la communauté des utilisateurs, propose de + transcrire, d’encoder et de diffuser des éditions numériques depuis + l’interface privée du CMS. Ce plugin, du nom de Transcript + (v. 0.1), a été développé par Vincent Buard de l’équipe + Numerizen. pour le projet Éditions de Manuscrits et + d’Archives Numériques (EMAN). de l’Institut des Textes et des Manuscrits Modernes + (ITEM), et plus particulièrement dans le cadre du projet Notes + de Cours de l’ENSNotes de + cours de l’ENS (un projet de la bibliothèque Ulm-Jourdan, en + partenariat avec l’ITEM et l’École nationale des chartes): .. Il intègre à Omeka un éditeur XML, permettant de + transcrire et d’annoter un texte avec des éléments TEI, représentés + sous la forme d’une boîte à outils TinyMCE(v. 4.0)Tiny MCE est un éditeur HTML WYSIWYG, qui + propose une boîte à outils permettant d’encoder un texte en HTML + sans voir de code: .. Il permet ainsi à un utilisateur non-expert + d’enrichir une transcription sans voir le code source (Dessaint s. + d.).

+

L’usage d’un éditeur XML-TEI accompagné d’une boîte à outils a déjà + été expérimenté dans le cadre de projets non-Omeka, à + l’exemple de Transcribe Bentham ou de la plateforme + TACTTACT (Plateforme de + transcription et d’annotation de corpus textuels): ., deux projets participatifs de transcription et + d’annotation de manuscrits. Il s’agit pour ces projets de mettre à + la portée du plus grand nombre l’encodage en XML-TEI, sans avoir + besoin d’une formation préalable à ce standard, afin d’inviter et + d’encourager les utilisateurs à participer.

+

Le plugin Transcript n’intègre pas l’ensemble des + balises TEI, mais repose sur un schéma adapté aux besoins d’un + projet. La présence de ce schéma permet au plugin de contrôler la + validité de l’encodage réalisé et de respecter les préconisations du + consortium TEI pour l’emploi des éléments, générant ainsi des + fichiers TEI standardisés. Une fois encodée, la transcription est + affichée sur l’interface publique d’Omeka via l’outil + TEI Boilerplate. qui permet de transformer et de publier en ligne des + fichiers TEI (Dessaint s. d.).

+
+ Fig. 6: Édition diplomatique des notes de cours de Henri Weil + sur Euripide (). + +
+

Comme pour le plugin TEI Editions, présenté plus haut, à + ce jour, seul le projet Notes de cours de l’ENS, qui a + été le contexte d’élaboration de Transcript, utilise ce + plugin. Ce projet propose les éditions numériques d’un important + fonds d’archives: celui des cours donnés à l’ENS (École Normale + Supérieure). Ce projet expérimental a pour objectif d’explorer les + possibilités offertes par Omeka pour la création et la + diffusion d’éditions numériques scientifiques (Sordet et Dessaint + 2019). C’est dans le cadre de ce projet que le plugin + Transcript a été mis au point. L’interface du + projet permet de présenter en vis-à-vis l’image numérisée et sa + transcription diplomatique. L’utilisateur a la possibilité + d’afficher la transcription à l’identique de l’image numérisée, ou + bien une version enrichie de cette transcription, avec notamment + l’extension des abréviations et l’insertion d’icônes représentant + les éléments TEI employés (Fig. 6). De cette manière, le projet + donne une certaine transparence à ses données et à la manière dont + elles ont été structurées.

+
+
+
+
+ Omeka et les plugins TEI: utilisabilité, durabilité et + maintenance +

+ Omeka est téléchargeable depuis le site de l’outil. Les plugins + «officiels», c’est-à-dire ceux mis au point par les développeurs + d’Omeka ou validés par eux, sont également disponibles au + téléchargement sur cette plateforme. Il existe en parallèle de très nombreux + plugins développés par la communauté des utilisateurs d’Omeka + et disponibles sur GitHub.

+

Compatible avec les systèmes Linux, Mac OS X et Windows, pour fonctionner, + Omeka a besoin d’un environnement Apache, + MySQL (v. 5) et PHP (v. 5.3.2). Pour la gestion des images, il est également + nécessaire d’installer l’outil de manipulation d’images, + ImageMagickImageMagick permet + de retailler les images importées dans Omeka et notamment de créer des + thumbnails: .. Le plugin Scripto nécessite l’installation de + MediaWiki, un logiciel qui permet de gérer les utilisateurs + et les transcriptions produites. Quant aux plugins TEI, ils ne requièrent + pas d’environnements ou d’outils spécifiques pour fonctionner, hormis ceux + requis pour Omeka. Les sites Web créés à partir de ce CMS sont + adaptatifs et peuvent être consultés depuis des appareils électroniques + variés (écrans d’ordinateurs, tablettes, smartphones).

+

L’installation d’Omeka est intuitive et se réalise en peu + d’étapes: il s’agit de déposer un répertoire contenant les fichiers du CMS + sur un serveur, de créer une base de données MySQL et de la relier à + l’instance d’Omeka. Ces étapes sont détaillées dans la + documentation de l’outil (User Manual), qui contient notamment + une section Getting Started et Installation Manuel utilisateur d’Omeka Classic: .. En cas de difficultés lors de l’installation ou de + l’utilisation du CMS, les utilisateurs peuvent se reporter au forum + d’OmekaForum d’Omeka: ., qui permet d’échanger avec les développeurs de l’outil, très + actifs sur cette plateforme, et d’accéder aux anciennes discussions, + classées en sept catégories: installation et mises à jour, dépannage, + plugins, thèmes, import/export, items et collections, développement. En + France, les utilisateurs peuvent également échanger via une liste de + diffusion et un site dédié aux projets et initiatives francophones autour + d’OmekaListe de diffusion + francophone autour d’Omeka: ..

+

L’installation des plugins est également détaillée dans le manuel + utilisateur. Les plugins officiels bénéficient d’une documentation qui leur + est propre sur le site officiel et sur GitHub, et qui précise + leurs modalités d’administration et de fonctionnement. En ce qui concerne + les plugins «non-officiels», parmi lesquels se trouvent les plugins TEI, la + présence de documentation sous la forme d’un fichier README.md est variable. + Ainsi, alors que le plugin TEI Display n’en propose aucune, les + plugins Scripto, TEI Editions et Transcript + en fournissent une détaillée.

+

Bien que documenté et facile à déployer, la principale limite + d’Omeka est la maintenance de son interface, et plus + particulièrement la maintenance d’interfaces complexes qui vont au-delà des + usages prévus par les développeurs du CMS, comme l’intégration + d’Omeka dans un processus d’édition numérique. En effet, la + réalisation d’un site web culturel avec ce CMS repose sur une juxtaposition + de plugins, aboutissant à une architecture composite, dont les différents + éléments n’évoluent pas au même rythme.

+

Dans le cadre de projets de diffusion de contenus, qui n’emploient que les + plugins et les thèmes officiels, sans apporter de lourdes modifications au + code source du CMS, la maintenance n’est pas un problème épineux. Lors d’une + mise à jour du CMS, les plugins de la liste officielle sont mis à jour + progressivement par l’équipe de développement d’Omeka + elle-même.

+

Cependant, dans le cadre de projets plus complexes, qui souhaitent un design + élaboré et mettre en place des fonctionnalités qui dépassent ce pour quoi + Omeka a été créé, la maintenance de l’infrastructure peut + constituer un obstacle important. En effet, en ce qui concerne les plugins + non-officiels, les mises à jour sont plus espacées, voire ne sont jamais + effectuées, ce qui peut entraîner une indisponibilité de certains plugins et + donc de services, voire un ajournement de la mise à jour d’Omeka + elle-même.

+

Or, la très grande majorité des plugins Omeka sont développés + par la communauté des utilisateurs elle-même. Ces plugins ont été pensés + dans le cadre de projets spécifiques, qui ont mis au point un environnement + Omeka unique, avec des thèmes et des plugins personnalisés + en fonction de leurs besoins. En effet, l’interface par défaut proposée par + Omeka est neutre et simplifiée afin de répondre aux besoins + d’un public large et divers. Elle nécessite donc un important travail de + personnalisation de l’organisation de l’interface publique, des pages, des + thèmes ou des plugins, afin d’adapter spécifiquement le CMS à son projet. + Or, l’ensemble de ces composants sont interdépendants, les plugins + influençant les thèmes et le code source d’Omeka, et + inversement. Par conséquent, un plugin développé dans le cadre d’un projet, + à l’image de Transcript ou de TEI Editions, peut + ne pas être compatible, dans un premier temps, avec l’instance d’Omeka + d’un autre projet, qui a lui-même sélectionné son propre thème et un + ensemble de plugins, et effectué certaines modifications dans le code source + du CMS + Transcript est un plugin en cours d’améliorations, qui + bénéficiera prochainement de nouvelles fonctionnalités, d’une meilleure + ergonomie et d’une plus grande stabilité. Dans le cas de TEI + Editions, pour éviter les problèmes de compatibilité, les + développeurs recommandent d’utiliser un thème particulier (EHRI + Omeka Editions Theme)..

+

Face à ce problème, l’une des réponses de la communauté a été de mettre à + disposition des autres utilisateurs les plugins préexistants qu’elle a + modifié pour élaborer un autre plugin. Par conséquent, dans la liste des + plugins non-officiels, il est possible de croiser des plugins officiels qui + ont été modifiés par un projet dans le cadre de l’élaboration d’un autre + plugin. Pour que cet autre plugin soit employé par un projet tiers, ce + dernier sera, dans certains cas, contraints d’installer également les + plugins modifiés pour élaborer ce nouveau plugin.

+

Ce problème de compatibilité et de maintenance des plugins constitue l’une + des principales limites d’Omeka et suppose un important travail + de développement et la présence d’une équipe de développeurs et d’ingénieurs + en soutien. Par conséquent, en termes de gestion des risques, il est + nécessaire, lors de l’utilisation d’Omeka comme plateforme pour + les éditions numériques, de prévoir un temps pour retravailler les plugins + et les adapter à son environnement Omeka, en cas de bogues + potentiels.

+
+
+ Omeka, plugins TEI et utilisateurs +
+ Fig. 7: Tableau de bord d'Omeka . + +
+

+ Omekas’adresse à des humanistes et à des professionnels de la + documentation et du patrimoine, sans compétences informatiques avancées. Si + lors du téléchargement, Omeka se présente sous la forme d’un + répertoire à déposer sur un serveur, une fois cette étape passée, il offre + une interface utilisateur prête à l’emploi. En effet, lors de sa connexion à + Omeka, l’utilisateur est immédiatement conduit sur un + tableau de bord (Fig. 7). Celui-ci permet à l’utilisateur d’embrasser d’un + seul coup d’œil ses dernières activités, mais également l’ensemble des + fonctionnalités proposées par l’outil. Il peut ainsi créer et gérer des + items, activer des plugins ou encore choisir un thème pour l’interface + publique du CMS. L’ensemble de ces activités s’effectue en un clic et peut + aisément être défait de la même manière.

+
+ Fig. 8: Activation/Désactivation des plugins dans Omeka + . + +
+
+ Fig. 9: Extrait du plugin TeiEdition, qui s'inscrit dans + l'interface graphique d'Omeka () + +
+

Prenons l’exemple des plugins. Comme nous l’avons vu précédemment, + l’installation des plugins est semblable à celle d’Omeka: + l’utilisateur télécharge un répertoire sur le site du CMS ou sur + GitHub, puis le dépose sur le serveur dans le répertoire + d’Omeka approprié. Le plugin apparaît ensuite dans une + liste accessible depuis l’onglet «Extensions» (Fig. 7). Une fois résolues + les éventuelles erreurs dues à des problèmes de compatibilité et nécessitant + une intervention dans le code source d’Omeka, chaque plugin + peut être activé, désactivé et supprimé en cliquant sur les boutons + correspondants (Fig. 8). Lorsqu’il est activé, le plugin devient en temps + réel paramétrable et utilisable, grâce à une interface utilisateur qui se + fond dans celle d’Omeka et sans voir une ligne de code, que le + plugin ait été développé par ses développeurs ou par des membres de sa + communauté, à l’image des plugins TEI (Fig. 9).

+

L’accès à l’interface privée d’Omeka est contrôlé par des rôles + utilisateurPrésentation détaillée des rôles + utilisateur: .. Ces rôles se distinguent par leur degré d’interaction avec + l’interface. Ainsi, un utilisateur avec un rôle de researcher + consulte les contenus sans interagir avec eux. Le + contributor ne peut modifier ou supprimer que les contenus + qu’il a créés. L’utilisateur admin, à l’inverse, interagit avec + tous les contenus, mais n’a pas accès à l’administration du site, des + plugins ou des thèmes, réservée au super utilisateur.

+

En termes d’accessibilitéOmeka et + l’accessibilité: . + Omeka et ses plugins par défaut (Exhibit Builder, + Coins et Simple Pages) + Exhibit Builder permet de créer des expositions virtuelles + à partir des items et des collections ajoutées à Omeka ; + Simple Pages, d’ajouter des pages statiques à + l’interface publique ; Coins, d’exporter les items sous la + forme de citations dans l’application de gestion bibliographique, + Zotero. suivent les recommandations du W3C + (WAI-ARIA)Standard WAI-ARIA: . et du programme américain Section 508 + . Omeka est ainsi utilisable sans souris, grâce à des + raccourcis clavier, ou à l’aide d’un lecteur d’écran, qui présente à + l’utilisateur le contenu d’une page avec un afficheur braille ou via la + synthèse vocale.

+
+
+ Conclusions +

Choisir Omeka pour réaliser une édition numérique ne s’impose + pas de prime abord. En effet, c’est avant tout un outil de publication, pas + un éditeur. Omeka ne permet pas de prendre en charge + l’ensemble des étapes d’un processus d’édition. Il intervient uniquement à + des moments ponctuels de ce processus, plus particulièrement aux étapes de + transcription et de publication qui sont actuellement les plus éprouvées par + les projets.

+

Bien qu’il n’y soit pas destiné, le recours à Omeka dans un + contexte d’édition numérique peut toutefois se justifier par plusieurs + raisons: la gestion d’importants volumes de données, grâce à son + organisation archivistique basée sur la notion d’items et de collections ; + l’indexation fine des contenus, qui peuvent être recherchés par mots-clés, + par date ou par auteur ; les nombreuses fonctionnalités de valorisation des + contenus, telles que les expositions virtuelles, les cartes ou les frises + chronologiques. Ainsi, les différents projets d’éditions numériques sous + Omeka que nous avons présentés dans cet article, s’ils + différent quant à la manière d’intégrer Omeka dans leur + processus d’édition, se caractérisent tous par l’important volume de leurs + collections (correspondances, matériaux d’archives), mais également par la + présence de ces outils (index, expositions virtuelles, cartes), qui + accompagnent l’utilisateur dans sa consultation et enrichissent son + expérience.

+

Bien que la création ou la diffusion d’éditions numériques en XML-TEI + représentent toujours un défi pour les projets et qu’aucun consensus n’a été + trouvé autour d’un plugin, ces différents éléments, constitutifs + d’Omeka, le font apparaitre comme environnement propice à + la diffusion d’éditions numériques. Omeka rencontre en effet + les principaux besoins des éditions numériques en tant qu’interface + utilisateur, tels que définis par Roberto Rosseli Del Turco (2011). Il + propose ainsi des «outils de manipulation d’image»Parmi les outils de visualisation disponible sur + Omeka, nous pouvons citer DocViewer, + Universal Viewer ou encore Mirador: ., qui permettent de naviguer aisément dans la numérisation de + l’imprimé qui fait l’objet d’une édition, de faire des agrandissements à + grande échelle ou encore de modifier la couleur, le contraste ou la + luminosité de l’image. Il dispose également de fonctionnalités de recherche + avancée, notamment quand on lui associe le moteur de recherche + SolR. qui permet de faire une indexation précise des données et de + faire ainsi ressortir toute la richesse d’un encodage en XML-TEI. Enfin, + Omeka permet également d’associer aux éditions numériques + tout un arsenal d’outils qui viennent les compléter et permettent de guider + l’utilisateur à travers le volume de données qui lui est présenté (Rosseli + Del Turco 2011, paragr. 27-29).

+

Actuellement, Omeka ne permet de représenter que des éditions + numériques diplomatiques. L’outil n’offre aucune solution pour l’affichage + d’éditions numériques critiques ou génétiques. D’une certaine manière, + Omeka tend à réduire les éditions numériques savantes à des + transcriptions enrichies d’éditions numérisées. De nombreuses recherches et + expérimentations sont, par conséquent, encore à mener pour l’usage + d’Omeka pour d’autres modèles d’éditions numériques + (critiques, génétiques ou synoptiques). Pour l’instant, la seule solution + est d’avoir recours à d’autres outils, spécialisés dans la création et + l’affichage de ce type d’éditions, et de les relier à + Omeka.

+
+
+
+ + Bleier, Roman, Martina Bürgermeister, Helmut W. Klug, Frederike Neuber, et + Gerlinde Schneider. 2018. Scholarly Digital Editions as + Interfaces. Norderstedt: BoD – Books on Demand + Bryant, Mike. 2019. TeiEditions (ReadMe.md). EHRI. . + Dessaint, Charlotte. s. d. «Le module utilisé pour la transcription : + Transcript». Note de cours de l’ENS (blog). Consulté le 15 octobre 2019. + . + Donadille, Julien, Pascale Lefebvre, Marie-Amélie Louveau, et Romain + Gaillard. 2006. «CMS et bibliothèques». Villeurbanne: Enssib. + Pierazzo, Elena. 2015. Digital scholarly editing: theories, models + and methods. Farnham: Ashgate. . + Del Turco, Roberto. 2011. «After the Editing is Done. Designing a Graphic + User Interface for Digital Editions». Digital Medievalist 7. + + . + Sahle, Patrick. 2008. «Virtual Library Digital Scholarly Editing». Text. + 2008. . + ———. 2016. «What is a Scholarly Digital Edition?» In Digital + Scholarly Editing: Theories and Practices, par Matthew James + Driscoll et Elena Pierazzo, 19‑39. Open Book Publishers. . + Sordet, Emmanuelle, et Charlotte Dessaint. 2019. «Notes de cours de + l’ENS». Plate-forme EMAN (blog). 22 septembre 2019. . + +
+ +
+
diff --git a/omeka/pictures/picture-1.png b/omeka/pictures/picture-1.png new file mode 100644 index 0000000..810f54b Binary files /dev/null and b/omeka/pictures/picture-1.png differ diff --git a/omeka/pictures/picture-2.png b/omeka/pictures/picture-2.png new file mode 100644 index 0000000..b1b75c8 Binary files /dev/null and b/omeka/pictures/picture-2.png differ diff --git a/omeka/pictures/picture-3.png b/omeka/pictures/picture-3.png new file mode 100644 index 0000000..11a0682 Binary files /dev/null and b/omeka/pictures/picture-3.png differ diff --git a/omeka/pictures/picture-4.png b/omeka/pictures/picture-4.png new file mode 100644 index 0000000..7ffc042 Binary files /dev/null and b/omeka/pictures/picture-4.png differ diff --git a/omeka/pictures/picture-5.png b/omeka/pictures/picture-5.png new file mode 100644 index 0000000..7f4576d Binary files /dev/null and b/omeka/pictures/picture-5.png differ diff --git a/omeka/pictures/picture-6.png b/omeka/pictures/picture-6.png new file mode 100644 index 0000000..d8ad14d Binary files /dev/null and b/omeka/pictures/picture-6.png differ diff --git a/omeka/pictures/picture-7.png b/omeka/pictures/picture-7.png new file mode 100644 index 0000000..0dbf8e3 Binary files /dev/null and b/omeka/pictures/picture-7.png differ diff --git a/omeka/pictures/picture-8.png b/omeka/pictures/picture-8.png new file mode 100644 index 0000000..6051712 Binary files /dev/null and b/omeka/pictures/picture-8.png differ diff --git a/omeka/pictures/picture-9.png b/omeka/pictures/picture-9.png new file mode 100644 index 0000000..27ce669 Binary files /dev/null and b/omeka/pictures/picture-9.png differ diff --git a/reledmac/pictures/picture-1.png b/reledmac/pictures/picture-1.png new file mode 100644 index 0000000..8094498 Binary files /dev/null and b/reledmac/pictures/picture-1.png differ diff --git a/reledmac/pictures/picture-2.png b/reledmac/pictures/picture-2.png new file mode 100644 index 0000000..58543d4 Binary files /dev/null and b/reledmac/pictures/picture-2.png differ diff --git a/reledmac/pictures/picture-3.png b/reledmac/pictures/picture-3.png new file mode 100644 index 0000000..ff7dd97 Binary files /dev/null and b/reledmac/pictures/picture-3.png differ diff --git a/reledmac/pictures/picture-4.png b/reledmac/pictures/picture-4.png new file mode 100644 index 0000000..33d97cb Binary files /dev/null and b/reledmac/pictures/picture-4.png differ diff --git a/reledmac/pictures/picture-5.png b/reledmac/pictures/picture-5.png new file mode 100644 index 0000000..85f8a02 Binary files /dev/null and b/reledmac/pictures/picture-5.png differ diff --git a/reledmac/reledmac-tei.xml b/reledmac/reledmac-tei.xml new file mode 100644 index 0000000..db4be44 --- /dev/null +++ b/reledmac/reledmac-tei.xml @@ -0,0 +1,1130 @@ + + + + Reledmac. Typesetting technology-independent critical editions with LaTeX + + author + + + Andrew N. J. + Dunning + + + Bodleian Library, University of Oxford + Oxford, UK + + andrew.dunning@bodleian.ox.ac.uk + + + + + Institut für Dokumentologie und Editorik + 2019-01-18 + https://ride.i-d-e.de/issue-11 + https://ride.i-d-e.de/issue-11/reledmac/ + 10.18716/ride.a.11.1 + + + + + + + + + + + Reledmac + Maïeul Rouquette + + + + + + + + 1987-2019 + + + https://ctan.org/pkg/reledmac + + + + 2019-07-21 + + + +

Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +

+
+
+ + + + + + + cf. Catalogue 0.1.1 + + What type of software is it? + + + + + + + + + + + + + + cf. Catalogue 1.4 + + On which platform runs the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + For what purpose was the tool developed? + + + + + + + + + + + + + + cf. Catalogue 1.6 + + Which is the financial model of the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + What is the development stage of the tool? + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TeX + + + + + + + + + + + cf. Catalogue 2.3 + + Does the tool reuse portions of other existing software? + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which character encoding formats are supported? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Is a pre-processing conversion included? + + + + + + + cf. Catalogue 3.2 + + Does the documentation list dependencies on other software, libraries or hardware? + + + + + + If yes, is the software handling the installation of dependencies during the general installation process (you don't have to install them manually before the installation)? + + + + + + + + + cf. Catalogue 3.4 + + Is documentation and/or a manual available? (tool website, wiki, blog, documentation, or tutorial) + + + + + + + cf. Catalogue 3.3 + + Which format has the documentation? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + Which of the following sections does the documentation contain? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + In what languages is the documentation available? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is there a method to get active support from the developer(s) or from the community? + + + + + + + cf. Catalogue 3.4 + + Which form of support is offered? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is it possible to post bugs or issue using issue tracker mechanisms? + + + + + + + cf. Catalogue 3.6 + + Grade how straightforward it is to build or install the tool on a supported platform: + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.7 + + Is there a test suite, covering the core functionality in order to check that the tool has been correctly built or installed? + + + + + + + cf. Catalogue 3.8 + + On which platforms can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + On which devices can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: On which browsers can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: Does the tool rely on browser plugins? + + + + + + + cf. Catalogue 3.8 + + Is there an API for the tool? + + + + + + + cf. Catalogue 3.9 + + Is the source code open? + + + + + + + cf. Catalogue 3.9 + + Under what license is the tool released? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LaTeX Project Public Licence + + + + + + + + + + + cf. Catalogue 3.9 + + Does the software make adequate acknowledgement and credit to the project contributors? + + + + + + + cf. Catalogue 3.9 + + Is the tool/software registered in a software repository? + + + + + + If yes, can you contribute to the software development via the repository/development platform? + + + + + + + cf. Catalogue 3.10 + + Can the code be analyzed easily (is it structured, commented, following standards)? + + + + + + + cf. Catalogue 3.10 + + Can the code be extended easily (because there are contribution mechanisms, attribution for changes and backward compatibility)? + + + + + + + cf. Catalogue 3.10 + + Can the code be reused easily in other contexts (because there are appropriate interfaces and/or a modular architecture)? + + + + + + + cf. Catalogue 3.11 + + Does the software provide sufficient information about the treatment of the data entered by the users? + + + + + + + cf. Catalogue 3.12 + + Is there information available whether the tool will be supported currently and in the future? + + + + + + + cf. Catalogue 3.13 + + Does the tool supply citation guidelines (e.g. using the Citation File Format)? + + + + + + + + + + cf. Catalogue 4.1 + + What kind of users are expected? + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.1 + + What kind of user interactions are expected? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.2 and 0.1.1 + + What kind of interface does the tool provide? + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.3 + + Does the tool provide a particular visualizations (in terms of analysis) of the input and/or the output data? + + + + + + + cf. Catalogue 4.4 + + Is the user allowed to customize the functioning of the tool and the output configuration? + + + + + + + cf. Catalogue 4.5 + + Does the tool provide particular features for improving accessibility, allowing „people with the widest range of characteristics and capabilities" to use it? + + + + + + + + +
+
Reledmac, an open-source package for the LaTeX typesetting system, offers a reliable method to arrange text on a page with multiple levels of scholarly apparatus and commentary. Its straightforward interface and wide availability has allowed its use in several projects aiming to visualize an edition encoded in TEI XML in a printed format.
Introduction

It is questionable whether anyone is happy with the traditional format of the critical printed edition. The critical apparatus was designed around the constraints of typesetting in the eighteenth century, and leaves much to be desired as a method of visualizing textual variation. Researchers rely daily on being able to search primary sources, but most public corpora are based on editions from the nineteenth century, since few series of critical editions make their texts openly available in digital form. Nonetheless, many academics view printed books as the most reasonable method of publishing critical editions in light of concerns over digital publications’ stability, readability, and authority. If we wish to encourage more scholars to start editing texts in ways that exploit the computing resources available to us, we need to provide ways to produce editions that give the best possible presentations of a text in both digital and print forms – editions that are designed for humans to read as well as our machines.

There is not yet a reusable solution to editing texts in such a technology-independent form, but one of the key pieces in this puzzle has existed for decades: software for automatically typesetting texts with a scholarly apparatus. Maïeul Rouquette’s Reledmac, a package for the venerable LaTeX typesetting system, has emerged as a readily available program that, used correctly, can produce professional results. A non-comprehensive bibliography lists almost seventy publications that have used it with a multitude of languages, including contributions from leading scholars (Wujastyk and Rouquette 2013). Although the package’s gestation over more than three decades has resulted in some quirks, the wide support for LaTeX and its portability to nearly any system has made Reledmac’s adoption possible by several digital editing projects, showing its potential as a key piece of scholarly infrastructure.

Fig. 1: Toronto Medieval Latin Texts edition and commentary, typeset using Reledmac.

This review addresses Reledmac 2.32.1, released 21 July 2019. It is based in part on my experience using it to typeset my own work encoded using the Text Encoding Initiative (TEI) guidelines. Reledmac’s greatest potential is as a mechanism for typesetting structured editions, but the TEI community has left this route underdeveloped and poorly documented. It remains less work to interact with the package directly if one’s primary goal is a short printed edition. I used Reledmac in this way when I typeset a student edition and commentary for the Toronto Medieval Latin Texts series (Robins 2019), which gave ample opportunity to explore its features. (Philippa M. W. Matheson had previously set my own edition in the series using plain TeX: Dunning 2016.) Robins wrote the edition in Microsoft Word, which I initially converted into LaTeX using Pandoc. before making manual adjustments (MacFarlane; cf. Krewinkel and Winkler 2017). I encountered several minor bugs in the package over the course of this project, which Rouquette was kind enough to address. I also submitted minor improvements to the package via its GitHub repository. The options available in Reledmac facilitated improvements on several aspects of the series design (Fig. 1 shows the final result), accommodating practically any layout for a commentary or apparatus.

Reviving typography with TeX

Reledmac is a mature tool with a long gestation; to understand its advantages, success, and idiosyncrasies, one needs to consider it within the development of the TeX typesetting system, which is sometimes called ‘plain TeX’ to distinguish it from the typesetting systems based on it such as LaTeX. Donald Knuth, a computer scientist and mathematician, first released this program in 1978 in reaction to the poor quality of early digitally typeset books (Knuth 1986 is his comprehensive guide). This program quickly became popular in academic circles, and is particularly respected for the Knuth-Plass line breaking algorithm (Knuth and Plass 1981), which composes text as a paragraph to minimize hyphenation and other typesetting problems. By contrast, standard word processors and Web browsers still compose text on a line-by-line basis, which produces a distinctly mechanical feel and reduces reading comprehension.

Both medieval scribes and early printers aimed for evenness in what designers now call ‘type colour’, producing text blocks that look almost grey when one squints at them, without distracting gaps or abrupt combinations of heavy and light text. The industrialization of typesetting beginning in the nineteenth century gradually eroded this principle. The developers of phototypeset books, beginning in the 1950s, almost completely ignored it. This underlies much of the poor quality of a digitally produced book from the 1970s in comparison to its equivalent from earlier centuries.

The celebrated typographer Hermann Zapf further advanced the state of automated typesetting with his Hz-program, introducing what is sometimes called ‘microtypography’. This software implemented the techniques of scribes and the first type compositors to produce consistent type colour by using narrower or wider versions of characters (Zapf 1993; Bringhurst 2013). This was far more laborious to produce in print than in manuscript, and quickly fell by the wayside, but this does not make the underlying concept less useful. It has been used for many recent books after Adobe Systems acquired Zapf’s software for integration in its page layout software, InDesign. The principles have also been implemented for TeX, and can be easily included in any document using the Microtype package (Schlicht 2004–2019). It is this approach to paragraph composition that allows the TeX family of software to produce such excellent results.

The LaTeX ecosystem

Few people now use TeX directly, instead using a derivative such that facilitates a structured approach to typesetting. Leslie Lamport released LaTeX in 1983, abstracting much of TeX behind macros centred on the organization of a document. For example, it provides a \chapter{Chapter Title} command to begin a new book chapter. A team of volunteers continues to maintain this program. LaTeX works around the principle of a document class, which allows one to specify a module that provides a starting design for a typical type of publication, such as an article, book, or letter. Developers have produced a wide range of classes that cover various scenarios, as well as packages that add extra capabilities to other classes – Reledmac is one of these.

These additions are both the greatest strength and weakness of LaTeX. Individual volunteers write most packages, which provide additional TeX programming that exists on the same level as LaTeX itself. They can produce unexpected results when combined. The creators of packages often cease to maintain them, and old packages are rarely pruned from CTAN, the standard repository for TeX-related software, leaving traps for users who might accidentally use one of these packages after finding an old reference to it. This situation has been remedied in part through the creation of KOMA-Script (Kohm 1994–2019) and Memoir (Wilson and Madsen 2001–2018), providing versatile and carefully conceived classes that eliminate the need for many packages and provide their own reference manuals covering most aspects of LaTeX. It is such packages, combined with the generous community of users that maintain online help forums, that has sustained LaTeX over so many years in spite of many shortcomings in its design.

The final complication of using TeX is the series of different engines that can turn its files into a PDF: there are three different options, each with limitations. TeX predates both PDF files and the need to display formatted documents on a screen: it originally produced its own format to be fed into a printer. An engine emerged by the late 1990s for producing a PDF from a TeX file, called pdfTeX; but it cannot handle many Unicode characters or normal system fonts. XeTeX is the second engine typically used today, implementing Unicode and modern font technologies, but in a way that broke compatibility with earlier LaTeX packages (including Microtype). LuaTeX is gradually emerging as a replacement for both, maintaining compatibility with pdfTeX alongside the innovations of XeTeX. It has not yet caught on universally, however, because it is much slower than the other two engines. Some linguistic support available for XeTeX (especially for right-to-left languages) is not yet complete for LuaTeX. As a result, the seemingly simple operation of turning a LaTeX file into a PDF can be fraught with complications.

An attempt to streamline this complex situation exists in ConTeXt, a more recent abstraction of TeX independent of LaTeX. In spite of its many improvements, it has yet to gain comparable traction because it lacks the ready-made classes and packages that allow one to quickly produce good results with LaTeX, as long as one is working within its paradigm. LaTeX is truly a reflection of humanity, showing the beauty that collective generosity can produce, but also the confusion that results from a lack of coordination.

Typesetting an edition with Reledmac

It is within this web of packages and different interfaces for TeX that Reledmac exists, and its history defines both its strengths and limitations. Reledmac originates in Edmac (short for ‘editing macros’), which John Lavagnino (a Shakespearean and systems manager then at Brandeis University, now at King’s College London) and Dominik Wujastyk (a Sanskrit scholar, then at the Wellcome Institute for the History of Medicine, now at the University of Alberta) designed in 1987–89, developing it in their spare time to support their own editing work (Lavagnino and Wujastyk 1990; Wujastyk 1993). LaTeX had not yet become widespread, and they designed the package to interact directly with plain TeX without taking LaTeX functionality into account. Beginning in 1994, Peter Wilson (a specialist in information modelling) ported the package to LaTeX as a pure labour of love, renaming it Ledmac (Walden 2006). Wilson was responsible for a large number of LaTeX packages, leaving a maintenance gap on his retirement that took several people to fill (Robertson 2009).

Maïeul Rouquette, a scholar of early Christianity at the University of Lausanne, took over Ledmac in 2011 when he was using it to write his doctoral thesis (Rouquette 2017), renaming it first Eledmac and then Reledmac, which allowed him to revise the interface and functionality without affecting projects that used older versions. He has since continued to improve the package’s functionality beyond the scope of his own research. Rouquette has also put significant energy into writing thorough documentation, alongside a general introduction to LaTeX for humanists that discusses Reledmac alongside Reledpar, its sister package for setting parallel texts (Rouquette 2012). The project’s GitHub repository lists fourteen other minor contributors. The culture of open-source software created out of goodwill for a practical end without explicit funding is typical for LaTeX packages. This haphazard model often produces useful results, but it is not clear that it is sustainable, especially as the employment of early-career researchers becomes increasingly unstable.

The legacy of the original Edmac and the process of its transition to LaTeX remains evident in the package as it now exists. Wilson had only the goal of making the package functional, and did not rewrite it to use the logic of LaTeX. As a result, the package must emulate the functionality of many basic LaTeX macros such as headings and block quotations rather than use them directly, and they often do not behave in the way one expects. For instance, although KOMA-Script and Wilson’s own Memoir class include environments for setting verse, they give unexpected results in Reledmac, and one instead needs to use its internal mechanism. One needs to treat Reledmac almost as a separate system from LaTeX, and the package would need to be rewritten to resolve this situation. The Ednotes package began this effort (Lück 2003), but it never reached equal functionality and development ceased in 2006. This situation is not the fault of the package’s authors, but it increases the challenge of converting text for typesetting in LaTeX with Reledmac, as well as the learning curve.

Fig. 2: A simple Reledmac document.

Once one understands Reledmac’s limitations, and its methodological focus on visualizing textual variants using traditional mechanisms developed for print, its interface is nearly as simple as one can achieve. A critical edition involves a complex dataset, and the LaTeX format imposes further constraints similar to those of the XML format underlying TEI. The software works from encoding for critical notes that focuses on typography rather than semantics, running its own TeX code to arrange notes and line numbers according to LaTeX’s positioning of the text. This is a basic document with critical notes (see also Fig. 2): \documentclass{scrbook} % KOMA-Script book class \usepackage{microtype} % improves justification \usepackage[pdfusetitle,hidelinks]{hyperref} % adds links from apparatus to text \usepackage[series={A,B}]{reledmac} % enables two levels of apparatus \title{Sample Edition} \author{Andrew Dunning} \begin{document} % begin LaTeX document \maketitle \chapter{Introduction} Introductory text. \chapter{Edition} % text outside \beginnumbering … \endnumbering works as normal LaTeX \beginnumbering % begins Reledmac numbered section \pstart % begin a paragraph in Reledmac; or use the \autopar command This is a \edtext{test}{\Afootnote{experimental \emph{L}}} \edtext{sentence}{\Bfootnote{Introduced to English via Old French from Latin \emph{sententia} 'opinion'.}}. \pend % end a paragraph in Reledmac \endnumbering % end Reledmac numbered section \end{document} % end LaTeX document

Files using Reledmac can be rendered using any LaTeX engine. It results in slightly longer compilation times than normal, because it needs to generate extra temporary files. The example above enables two series of critical notes with Reledmac. (One can instead use standard numbered footnotes or endnotes.) The \edtext{text}{commands} command marks a word or phrase for comment; one can add as many commands as necessary in the second set of braces for a critical apparatus, source apparatus, or whatever else the edition requires. One can have multiple notes on the same word using \Afootnote, \Bfootnote, and so forth.

Fig. 3: Usage of the \lemma command.

Reledmac demonstrates a few minor shortcomings in facilitating features of high-quality editions, though there are usually ways to achieve the desired results by hand. When making a note on a long passage, most editors will refer only to its first and last words. In Reledmac, this requires the \lemma command to truncate the text (see also Fig. 3): \edtext{Arma uirumque cano}{\lemma{arma … cano} \Bfootnote{The opening line of Virgil, \emph{Aeneid.}}}

This must be done by hand for every note that does not quote the full lemma. In some cases, this is advantageous. For commentaries in particular, the ability to write one’s own lemma to focus on the precise passage in question is a great help. On the other hand, it would be a great service if Reledmac could borrow Classical Text Editor’s options for setting document-wide styles to automatically process lemmata by truncating a phrase to the first and last words; removing punctuation and other specified characters; making the text lowercase; and transliterating text as appropriate, for example from V to u in Latin. Similarly, it would be useful to have an option to abbreviate number ranges automatically (e.g. changing ‘107–108’ to ‘107–8’). These, however, are among the few obvious examples of missing functionality in the package.

Fig. 4: Example of automatic cross references.

Reledmac also includes a powerful cross referencing system, allowing one to add references to page and line numbers and have them remain accurate through any changes to the document (see also Fig. 4): \documentclass{scrbook} \usepackage{microtype} \usepackage[pdfusetitle,hidelinks]{hyperref} \usepackage[series={A,B}]{reledmac} % Add labels to cross references \setapprefprefixsingle{line } \setapprefprefixmore{lines } \setSErefprefixsingle{line } \setSErefprefixmore{lines } \setSErefonlypageprefixsingle{p.~} \setSErefonlypageprefixmore{pp.~} \title{Sample Edition} \author{Andrew Dunning} \begin{document} \maketitle \chapter{Introduction} Introductory text: see \SEref{sentence} and note to \appref{test}. \chapter{Edition} \beginnumbering \pstart \edlabelS{sentence}This is a \edtext{test}{\applabel{test} \Afootnote{experimental \emph{L}}} \edtext{sentence}{\Bfootnote{Introduced to English via Old French from Latin \emph{sententia} `opinion'.}}.\edlabelE{sentence} \pend \endnumbering \end{document}

Reledmac has several commands for creating cross references, but most users will only need two. The \SEref{label} command allows one to refer to a range of text between \edlabelS{label} and \edlabelE{label} (or \edlabelSE{label} for a single point in the text). The \appref{label} allows one to refer to the lines to which a critical note labelled with \applabel{label} refers.

Using Reledmac with TEI

LaTeX syntax is less verbose than XML, and I have known several colleagues who have found it initially much easier to understand than TEI. Over the long term, however, writing an edition in TEI rather than directly in LaTeX is more sustainable, even if it is intended purely for print publication. From a practical perspective, XML validation allows one to find errors more quickly: a missing bracket can cause LaTeX to fall over itself in reporting obtuse error messages through its logs, which themselves are more difficult to read than necessary. Reledmac is focused purely on typesetting, making it difficult to develop mechanical checks for one’s editorial work. TEI’s focus on semantic markup is highly useful in this respect, and a number of researchers have taken advantage of this on a project-level basis. It is crucial that the TEI community seize this opportunity if it wishes to be viewed as a serious publishing option.

There are a number of scripts available for typesetting TEI editions with LaTeX and Reledmac, most of them developed to fit the needs of specific projects. The earliest of these is part of the TEI Consortium’s official stylesheets (Rahtz et al. 2011–2019). These stylesheets do not render text following any scholarly convention for a printed critical edition, and are complex to modify. As a result, implementations for individual projects are usually written from scratch (e.g. Witt 2018; Camps 2017; McLean 2015–2016). None yet offer a general-purpose tool that renders TEI elements into the form one would normally expect for printed editions of premodern texts.

Fig. 5: TEI Critical Apparatus Toolbox.

Marjorie Burghart’s TEI Critical Apparatus Toolbox. (see Fig. 5) is an especially promising use of an automated typesetting tool as one of several elements for supporting the creation of editions (Burghart 2016). This web application provides tools for finding common errors in a critical apparatus (such as an unaccounted source), extracting the text of a particular witness, finding statistics on a document, and turning a standard TEI critical apparatus into a PDF using Reledmac. It provides an interface for most of Reledmac’s options, making them much easier to find than by sorting through its manual. Although it is not yet finished, this is an excellent demonstration of the role both TEI and Reledmac could play in developing a solution for creating any type of edition, and not merely one geared to a particular format.

Such attempts are achievable because of the wide support for integrating LaTeX into other environments and its portability. Any full LaTeX distribution includes Reledmac, ranging from versions for every common platform to editions that work inside a browser such as Overleaf. Rouquette has made a concentrated effort to document the package’s options, alongside his introductory book to LaTeX. There is also an active community of users that provide support for one another in the Reledmac section on TeX Stack Exchange. The plethora of online tutorials for LaTeX and the wide availability of the software, including on mobile platforms, makes it much easier to gain usable results from it than from TEI if one is working independently.

At the same time, LaTeX has a number of oddities that can make transformation from XML somewhat complex. For example, there is no standard mechanism for changing the language, as there are two mutually incompatible packages for achieving this (Babel + and Polyglossia). Reledmac also poses its own difficulties. For the historical reasons noted above, it is necessary to encode text (including headings and paragraphs) slightly differently from normal LaTeX. It also cannot automatically index identical words in a single line. In a critical apparatus, if one has two instances of ‘et’ in a single line, one would refer to them as ‘et1’ and ‘et2’. An extra script can mostly remedy this, but there remain some situations in which it must be checked by hand (Christensen 2018). In short, LaTeX is not the solution we would create today if we were developing it again from scratch – but it is the one we have, and it can produce excellent results when used carefully.

Future directions

Given this history of software cobbled together by a series of programmers, humanists, and non-specialists in spare time over three decades, it is a small miracle that LaTeX with Reledmac is not merely functional but has become the most reliable method of automatic typesetting for critical editions. It is to be hoped that one day the editing community will band together to give the project more support and ensure its sustainability, for it is clear that Rouquette could create a much more functional package if he had the time, resources, and desire to redesign it from the ground up. Both using the package directly and typesetting critical editions from TEI XML would be much more straightforward with a package designed from the outset to work with LaTeX. Alternatively, there might be more promise in creating a critical editing module for ConTeXt, a rationalized competitor to LaTeX that has a focus on typesetting XML directly without the need to first transform it into a different markup language. There have been some forays down this path (Hamid 2007), but nothing has yet seen the light of day.

In the small field of software for critical editing, Reledmac fills a helpful niche alongside the more complex TUSTEP (Ott 1979; Schälkle and Ott 2018) and the commercial Classical Text Editor (Hagel 2007), focusing on providing a key element of a publishing workflow rather than an all-encompassing editing environment. Its interface is as user-friendly as one can achieve in LaTeX code; its clear documentation and examples mean that one can reasonably expect to learn it oneself; and it can produce documents of the highest quality. One can hardly ask for more, and our community is indebted to Rouquette and his predecessors for putting so much of their energy into the basic digital infrastructure for the humanities that often goes unacknowledged.

Bringhurst, Robert. 2013. The Elements of Typographic Style. 4th ed. Vancouver, BC: Hartley & Marks.Burghart, Marjorie. 2016. ‘The TEI Critical Apparatus Toolbox: Empowering Textual Scholars Through Display, Control, and Comparison Features’. Journal of the Text Encoding Initiative 10 (December). .Camps, Jean-Baptiste. 2017. TEItoLaTeX. .Christensen, Michael Stenskjær. 2018. Samewords: Word Disambiguation in Critical Text Editions. .Dekker, Dirk-Jan. 2012. ‘Typesetting Critical Editions with LaTeX: Ledmac, Ledpar and Ledarab’. 11 November 2012. .Dunning, Andrew N. J. 2016. Samuel Presbiter: Notes from the School of William de Montibus. Toronto Medieval Latin Texts 33. Toronto: Pontifical Institute of Mediaeval Studies.Hagel, Stefan. 2007. ‘The Classical Text Editor. An Attempt to Provide for Both Printed and Digital Editions’. In Digital Philology and Medieval Texts, edited by Arianna Ciula and Francesco Stella, 77–84. Ospedaletto: Pacini. .Hamid, Idris. 2007. ‘A Short Introduction to Critical Editions’. In ConTeXt User Meeting. Epen, Netherlands. .Knuth, Donald E. 1986. The TeXbook. Computers & Typesetting, A. Reading, MA: Addison-Wesley.Knuth, Donald E., and Michael F. Plass. 1981. ‘Breaking Paragraphs into Lines’. Software: Practice and Experience 11 (11): 1119–84. .Kohm, Markus. 1994–2019. KOMA-Script: A Versatile LaTeX2ε Bundle. .Krewinkel, Albert, and Robert Winkler. 2017. ‘Formatting Open Science: Agilely Creating Multiple Document Formats for Academic Manuscripts with Pandoc Scholar’. PeerJ Computer Science 3 (May): e112. .Lavagnino, John, and Dominik Wujastyk. 1990. ‘An Overview of EDMAC: A Plain TeX Format for Critical Editions’. TUGboat 11 (4): 623–43. .Lück, Uwe. 2003. ‘Ednotes — Critical Edition Typesetting with LaTeX’. TUGboat 24 (2): 224–36. .MacFarlane, John. 2006–2019. Pandoc. .McLean, Tom. 2015–2016. Tei_transformer. .Ott, Wilhelm. 1979. ‘A Text Processing System for the Preparation of Critical Editions’. Computers and the Humanities 13 (1): 29–35. .Rahtz, Sebastian, Martin D. Holmes, Hugh Cayless, and Syd Baumann. 2011–2019. TEI XSL Stylesheets. Text Encoding Initiative Consortium. .Robertson, Will. 2009. ‘Peter Wilson’s Herries Press Packages’. TUGboat 30 (2): 290–92. .Robins, William. 2019. Historia Apollonii regis Tyri: A Fourteenth-Century Version of a Late Antique Romance. Toronto Medieval Latin Texts 36. Toronto: Pontifical Institute of Mediaeval Studies.Rouquette, Maïeul. 2012. (Xe)LaTeX appliqué aux science humaines. Tampere: Atramenta. .———. 2017. ‘Étude comparée sur la construction des origines apostoliques des Églises de Crète et de Chypre à travers les figures de Tite et de Barnabé’. PhD thesis, Université de Lausanne.Schälkle, Kuno, and Wilhelm Ott. 2018. TUSTEP: Tübinger System von Textverarbeitungs-Programmen. Tübingen: Pagina. .Schlicht, Robert. 2004–2019. The Microtype Package. .Walden, David. 2006. ‘Peter Wilson: Interview’. TeX Users Group. 8 November 2006. .Wilson, Peter R., and Lars Madsen. 2001–2018. The Memoir Class. .Witt, Jeffrey C. 2018. ‘Digital Scholarly Editions and API-Consuming Applications’. In Digital Scholarly Editions as Interfaces, edited by Roman Bleier, Martina Bürgermeister, Helmut W. Klug, Frederike Neuber, and Gerlinde Schneider, 219–47. Schriften des Instituts für Dokumentologie und Editorik 12. Norderstedt: Books on Demand. .Wujastyk, Dominik. 1993. Metarules of Pāṇinian Grammar: The Vyāḍīyaparibhāṣāvṛtti Critically Edited with Translation and Commentary. 2 vols. Groningen Oriental Studies 5. Groningen: Forsten.Wujastyk, Dominik, and Maïeul Rouquette. 2013. ‘Critical Editions Typeset with EDMAC, LEDMAC, eLEDMAC and reLEDMAC’. Zotero. 2013. .Zapf, Hermann. 1993. ‘About Micro-Typography and the Hz-Program’. Electronic Publishing 6: 283–88. .
\ No newline at end of file diff --git a/tustep/pictures/picture-1.jpg b/tustep/pictures/picture-1.jpg new file mode 100644 index 0000000..2b8503e Binary files /dev/null and b/tustep/pictures/picture-1.jpg differ diff --git a/tustep/pictures/picture-2.jpg b/tustep/pictures/picture-2.jpg new file mode 100644 index 0000000..3d9a021 Binary files /dev/null and b/tustep/pictures/picture-2.jpg differ diff --git a/tustep/pictures/picture-3.jpg b/tustep/pictures/picture-3.jpg new file mode 100644 index 0000000..c5d2858 Binary files /dev/null and b/tustep/pictures/picture-3.jpg differ diff --git a/tustep/pictures/picture-4.jpg b/tustep/pictures/picture-4.jpg new file mode 100644 index 0000000..7be45e6 Binary files /dev/null and b/tustep/pictures/picture-4.jpg differ diff --git a/tustep/tustep-tei.xml b/tustep/tustep-tei.xml new file mode 100644 index 0000000..1dbd184 --- /dev/null +++ b/tustep/tustep-tei.xml @@ -0,0 +1,1255 @@ + + + + + Tustep. Review of the Tübinger System von Textverarbeitungs-Programmen + + author + + + Griesinger + Christian + + + University of Wuppertal + Wuppertal + + griesinger@uni-wuppertal.de + + + + + Institut für Dokumentologie und Editorik + 2019-01-18 + https://ride.i-d-e.de/issue-11 + https://ride.i-d-e.de/issue-11/tustep/ + 10.18716/ride.a.11.2 + + + + + + + + + + + Tustep - Tübinger System von Textverarbeitungs-Programmen + ITUG e. V. + + + + + + + + 2018 + + + https://www.tustep.uni-tuebingen.de/ + + + + 2019-10-01 + + + +

Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +

+
+
+ + + + + + + cf. Catalogue 0.1.1 + + What type of software is it? + + + + + + + + + + + + + + cf. Catalogue 1.4 + + On which platform runs the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + For what purpose was the tool developed? + + + + + + + + + + + + + + cf. Catalogue 1.6 + + Which is the financial model of the tool? + + + + + + + + + + + + + + + + + + + cf. Catalogue 1.5 + + What is the development stage of the tool? + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.3 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TUSTEP, TUSCRIPT + + + + + + + + + + + cf. Catalogue 2.3 + + Does the tool reuse portions of other existing software? + + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + any plain text file format + + + + + + cf. Catalogue 2.4 + + Which programming languages and technologies are used? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 2.4 + + Which character encoding formats are supported? + + + + + + + + + + + + + + + + + + + + + some windows codepages + + + + + + + + + + Is a pre-processing conversion included? + + + + + + + cf. Catalogue 3.2 + + Does the documentation list dependencies on other software, libraries or hardware? + + + + + + If yes, is the software handling the installation of dependencies during the general installation process (you don't have to install them manually before the installation)? + + + + + + + + + cf. Catalogue 3.4 + + Is documentation and/or a manual available? (tool website, wiki, blog, documentation, or tutorial) + + + + + + + cf. Catalogue 3.3 + + Which format has the documentation? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + Which of the following sections does the documentation contain? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.3 + + In what languages is the documentation available? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.4 + + Is there a method to get active support from the developer(s) or from the community? + + + + + + + cf. Catalogue 3.4 + + Which form of support is offered? + + + + + + + + + + + + + + + + + + + + + a wiki platform + + + + + + + + + + + cf. Catalogue 3.4 + + Is it possible to post bugs or issue using issue tracker mechanisms? + + + + + + + cf. Catalogue 3.6 + + Grade how straightforward it is to build or install the tool on a supported platform: + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.7 + + Is there a test suite, covering the core functionality in order to check that the tool has been correctly built or installed? + + + + + + + cf. Catalogue 3.8 + + On which platforms can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + On which devices can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Rasperry Pi + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: On which browsers can the tool/software be deployed? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.8 + + If the tool is web-based: Does the tool rely on browser plugins? + + + + + + + cf. Catalogue 3.8 + + Is there an API for the tool? + + + + + + + cf. Catalogue 3.9 + + Is the source code open? + + + + + + + cf. Catalogue 3.9 + + Under what license is the tool released? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 3.9 + + Does the software make adequate acknowledgement and credit to the project contributors? + + + + + + + cf. Catalogue 3.9 + + Is the tool/software registered in a software repository? + + + + + + If yes, can you contribute to the software development via the repository/development platform? + + + + + + + cf. Catalogue 3.10 + + Can the code be analyzed easily (is it structured, commented, following standards)? + + + + + + + cf. Catalogue 3.10 + + Can the code be extended easily (because there are contribution mechanisms, attribution for changes and backward compatibility)? + + + + + + + cf. Catalogue 3.10 + + Can the code be reused easily in other contexts (because there are appropriate interfaces and/or a modular architecture)? + + + + + + + cf. Catalogue 3.11 + + Does the software provide sufficient information about the treatment of the data entered by the users? + + + + + + + cf. Catalogue 3.12 + + Is there information available whether the tool will be supported currently and in the future? + + + + + + + cf. Catalogue 3.13 + + Does the tool supply citation guidelines (e.g. using the Citation File Format)? + + + + + + + + + + cf. Catalogue 4.1 + + What kind of users are expected? + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.1 + + What kind of user interactions are expected? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.2 and 0.1.1 + + What kind of interface does the tool provide? + + + + + + + + + + + + + + + + + + + cf. Catalogue 4.3 + + Does the tool provide a particular visualizations (in terms of analysis) of the input and/or the output data? + + + + + + + cf. Catalogue 4.4 + + Is the user allowed to customize the functioning of the tool and the output configuration? + + + + + + + cf. Catalogue 4.5 + + Does the tool provide particular features for improving accessibility, allowing „people with the widest range of characteristics and capabilities" to use it? + + + + + + + + +
+ +
This review deals with the Tübinger System von Textverarbeitungs-Programmen called TUSTEP. It describes some of the principal functions of TUSTEP, outlines sample use cases and shows benefits and downsides. While TUSTEP is one of the oldest and long-lasting applications in the area of text processing, its flexibility and functional scope make it worth to consider for present and future scientific projects dealing with text edition, manipulation and publishing.
+ + + +
+
+ General introduction +

TUSTEP is a software toolbox or environment designed for the Digital Humanities. It has been under constant development since 1966, bearing the name of TUSTEP since 1978. It was located at the Zentrum für Datenverarbeitung (ZDV) of the University of Tübingen until 2003 and is since then the responsibility of the International TUSTEP User Group (ITUG).Cf. the ITUG homepage .

+

The program authors, Wilhelm Ott and Kuno Schälkle, still update and develop the program, issuing a new version approximately each year. TUSTEP 2018 is the latest stable version for production at the moment.

+

This version and older versions may be downloaded, installed and used for free. They are compatible with Windows, Linux and Mac operating systems and I found the installation to be straightforward.Cf. the TUSTEP homepage . The author is currently doctoral student at the Universities of Bern and Trier and lecturer at the University of Wuppertal (). He is TUSTEP-user since 2009 and user-developer since 2014. His research interests are Middle High German language, historical lexicography, edition philology and computer philology especially with TUSTEP, XML (TEI), SQL and PHP. This review is based upon the Windows version.It will not take into account other operating systems. The installation guides for Linux or MAC are available on the TUSTEP homepage, unfortunately only in German. On Windows, it is sufficient to open the installation file and choose a folder for installation. It is recommended to additionally install GhostScript and/or GhostViewCf. the GhostScript homepage . because Windows and TUSTEP do not have a native support for Postscript-files (.ps) and these are mandatory when it comes to typesetting. For MAC and Linux, this step is usually not necessary.

+

While TUSTEP is normally installed on a personal computer, it is also possible to run it on a Linux Server and there have even been successful experiments to run the software on Raspberry Pis.A demonstration of TUSTEP on Raspberry Pi was given at the 2018 ITUG Conference in Potsdam. See . The whole source code is provided in open access and may be modified and compiled under a BSD-license.

+

Being a software toolbox, TUSTEP does not have a single purpose, functionality or area of application ‒ except that it is designed to work with texts. As will be explained in more detail in the next section, TUSTEP consists of a variety of modules, each of which being designed for its specific task. Those modules can be combined almost freely by the user to create very complex workflows, so that the program suits the users’ individual needs. With regard to digital scholarly editions, TUSTEP supports all steps of a typical editorial workflow: From transcription to collation, from the constitution of critical texts with up to nine apparatus levels to the creation of indices and concordances, from professional typesetting to converting a text into any desired plain text file format.

+
+
+ What is TUSTEP capable of +

As said above, TUSTEP consists of individual modules also referred to as commands. There are currently 54 commands in TUSTEP. Each has a German and an alternative English alias that consists of a leading "#", the command name and a variable number of parameters. + For example, there is a command for copying the contents of one file into another. The name of this command is #KOPIERE (or #COPY in English). As shown below, it has a few comma separated parameters: "quelle" (source) and "ziel" (destination) ask for filenames, "modus" (mode) and "loeschen" (delete) ask in a short notation ("+" for yes, "-" for no) whether the line numbers of the file should be recounted or the destination file overwritten: + + + #KOPIERE, + quelle = file_a, + ziel = file_b, + modus = +, + loeschen = + + +

+ +

The order of the parameters can be altered if one wishes to, but there is a standard order. TUSTEP comes with a command-line based user interface which is described in more detail below. The user tells TUSTEP the commands to be executed by typing them into the command line. In order to save a lot of typing, any command may be abbreviated after the second letter at the latest and all parameters names may be omitted if the standard order is observed. The above command may also look like this: + + + #KO, file_a, file_b, +, + + +

+ +

Both notations do the same thing: Copy the contents of "file_a" into "file_b", recount the line numbers and overwrite the former contents of "file_b". It is possible to combine the short hand notation with the full notation.

+

One benefit of TUSTEP lies in the extensive functionality of its commands. While copying a file, the user may simultaneously want to manipulate the contents of that file. Let’s suppose, there is a certain XML or SGML tagset in "file_a" that should be converted to TEI-XML. The command accepts additional parameters (here between "*" and "*eof" ‒ eof standing for "end of file"): + + + + #KOPIERE, file_a, file_b, +, +, parameter = * + XX |<Name>|<persName>| + XX |</Name>|</persName>| + *eof + +

+ +

The parameter "XX" exchanges strings with other strings. It also accepts string patterns, so it is possible to compress these two lines of parameters into one by making the backslash optional.It should be noted however that the use of string and string patterns replacement for refactoring XML would only work if the XML file has a regular content, something which is not imposed by the XML syntax: one may thing for instance of extra whitespaces or changes in the order of the attributes, which would cause mismatches. But this is not the only possible parameter. There are dozens for #KOPIERE that can be combined. One of them, for example, selects only lines in a file which meet certain criteria (e. g. contain a specific string), exchanges some parts of that string, adds consecutive numbers at the beginning of the line and introduces some tags. By this procedure, it would be easy to search for strings in a text and give all search results back as a numbered result set in XML, HTML or any other text format ‒ perhaps you need a comma separated list to use in Excel or to import into a SQL-Database? That would be no problem.

+ +

Aside from #KOPIERE, TUSTEP contains a variety of commands for editing a text (#EDIERE), indexing a text (#RVORBREITE ‒ prepare a registry by breaking down the texts into tokens, #SORTIERE ‒ sort the tokens by any alphabetical or numerical order and #RAUFBEREITE ‒ sum up the tokens into the registry entries with or without references), comparing two or more texts with each other (#VERGLEICHE and #VAUFBEREITE) or typeset a text (#SATZ). In digital scholarly editions, these are all very important features. Some of them will be discussed here below. +

#EDIERE opens a text editor in which the user can edit a text file with up to 2 Gigabyte size. TUSTEP opens a file of this size in the same amount of time as a file of one kilobyte. While editors in office suits (Microsoft Word, Libre Office Writer etc.) and many programming editors (like Notepad++, Oxygen XML Editor) load the whole file at once, TUSTEP only loads as many lines as fit on the screen. The user can navigate "screen by screen" or jump to any particular part of that file without losing much time on scrolling or buffering the whole file. The editor comes with a pattern matching syntax (similar to regular expressions) in order to search for inflection forms or suffixes. It supports individual shortcuts, syntax highlighting and has the ability to perform user defined macros.

+ +

#VERGLEICHE can perform a comparison of multiple texts, as many as the user wants, by comparing each text with a base text. #VAUFBEREITE generates a formatted list of the differences between these texts. These two commands can be used for collating text witnesses, or to compare the changes made in previous working sessions ‒ which is very useful when it comes to correcting. By adding parameters, it is possible to specify for every single comparison what is considered to be important or unimportant.

+ +

Automatically creating indices with #RVORBEREITE, #SORTIERE and #RAUFBEREITE is important for checking an edited text in search of errors or to create registers of persons and places. The fact that the creation of indices is broken down into three separate steps makes it very flexible. In every step, parameters can be added that, for instance, control which parts of a text are indexed, how the word forms are sorted and how they are summarized.

+ +

Finally, one of the most important features in TUSTEP is #SATZ, which enables to professionally typeset texts with or without footnotes, critical apparatus, illustrations, and thus prepare them for publication. In a review, it would be impossible to give a comprehensive description of this module alone, but at least I want to give an impression of its features. In the TUSTEP handbook, over 200 pages are dedicated to describe the functional scope of #SATZ, which includes native support for many alphabets like Latin, Greek, Coptic, Hebrew, Arabian or Russian, and give, for instance, the possibility to place any arbitrary superscript or subscript letter over or under another one.TUSTEP supports Unicode; the fonts availability depends on the end user. For more information, see , unfortunately only available in German.

+ +

While it is nice to have a separate command for each purpose, research projects and especially editorial enterprises need more complex workflows, in which many steps have to be executed successively. So it would be very uncomfortable to type in the commands every time. In TUSTEP, the user can therefore create her/his own workflows by writing the commands needed into a temporary or permanent file and execute this file over and over again. This gives the ability to the user to create a customized program. It is also possible to execute such a file with variables to adapt the program at each execution. The user may furthermore save the interim results or create protocols for every step to analyze them later. A sample program may look like this (I have omitted all specifications just to show the concept). My comments are behind the two slashes:

+ + + #UMWANDLE, ... // Import an .xml-file into a TUSTEP file + #KOPIERE, ... // Copy the text of this file into another one while manipulating it + #SATZ, ... // Typeset the manipulated text, in order to have page and line numbers + #RVORBEREITE, ... // Break down the formatted text into tokens + #SORTIERE, ... // Sort the tokens + #RAUFBEREITE, ... // Create a register with the page and line numbers from #SATZ + #KOPIERE, ... // Copy the register underneath the text + #SATZ, ... // Typeset text and register together + #*PSAUS, ... // Output both to Postscript, in order to send to a publisher a .ps-file + #KOPIERE, ... // Copy both texts again, enriching them with HTML-tags for digital publication + #DATEI, ... // Create a new file, for example an .html-file + #UMWANDLE, ... // Write the contents with HTML-tags into this file + + +

TUSTEP is also equipped with a modern scripting language called TUSCRIPT. After writing TUSTEP-commands into a file, it is possible to control the workflow with structures from TUSCRIPT, like if-then-queries, loops and a variety of predefined functions. The scripting language allows to largely automatize the workflows.

+

But what is more important: TUSTEP gives to the user the possibility to create new and more complex tools (i. e. new programs) out of the pieces of software in the toolbox. Hence the true power of TUSTEP does not lie in the functionality of its single components, but in their flexibility and the possibility to combine them as needed to find customized solutions for specific and new problems. It is a major feature of TUSTEP that the user is able to import and export their data or edition materials at any time from and into any plain text format with any desired markup (for example custom XML or TEI-XML). Hence, the results of a TUSTEP based workflow can be processed further in other applications, and vice versa results from other applications can be processed with TUSTEP.

+

+ While many applications work with a specific markup, for example XML-tags, TUSTEP does not need a fixed markup, except when it comes to typesetting. When typesetting a text with #SATZ, the typesetting engine needs specific control instructions. But the user can tell #SATZ which elements of the markup should be converted into which control instructions.

+
+
+ How to work with TUSTEP + +
Fig. 1: TUSTEP starting screen to create a session.
+ +

When starting the program for the first time, the screen leads the user to a menu (see Fig. 1) for creating a "Sitzung" (session). A session ‒ one may think of as the project’s default directory or root directory on the user hard drive ‒ is a security concept and one that forces to keep order in the files. All files of a project have to be in this directory (or in sub-directories) to enable access to a TUSTEP session.If files are to be shared, there is the possibility to create a remote session, where the files are stored on a server that all developers of a project have access to. TUSTEP sessions avoid maintenance issues by ensuring that only one session at a time can have writing access to a single file. A session stores much information about actions in temporal files (for example the last commands typed, the last files opened, the individual shortcuts chosen for this particular project) ‒ but it will never send this or any other information automatically over the internet.Many applications (for example Microsoft Office or the Atom editor) send by default usage statistics, crash reports or other metrics (often referred to as diagnostic data) over the internet unless a user explicitly deactivates this functionality. In this review, of course, there is no room to discuss the pros and cons of programs sending usage information to their developers. Anyway, TUSTEP does not have a function to collect user data. Thus it is not only recommended but safe to create a session for each project one is working on. With the creation of a session, it is also possible create a desktop icon, that would start that particular session.

+ +
Fig. 2: Starting screen of the session.
+ +

Once TUSTEP is open with a session, the starting screen (see Fig. 2) pops up showing the command line at the bottom where the user can enter commands directly. In the blue field above, TUSTEP gives information about the program version and feedback on the commands entered. This includes error messages and notifications of success.

+ +
Fig. 3: The TUSTEP editor.
+ +

If the user types the command #EDIERE (or #e in short) into the command line and hits enter, the TUSTEP editor opens an empty file. The editor screen has three main parts (as shown in Fig. 3): A field for the line numbers on the left, a text field on the right and a command line at the bottom. While in many editors (like Oxygen XML editor or Notepad++) line numbers are just for display and are not an essential part of the file itself, TUSTEP’s own file format saves the line numbers inside the file. That is why it is necessary to import .xml or .txt or .rtf files into TUSTEP by converting them into the TUSTEP file format and exporting TUSTEP files back to other file formats: This adds (or eliminates) the line numbers which guarantee fast access to the file (as mentioned in the above section) and enable TUSTEP to load a reduced number of lines.

+ +

The line numbers can also be used to imitate simple text structures. Three levels of line numbers are accepted, such as 200.098/54. This number can be interpreted as page number 200 and line number 98 referring to a printed book, with a third additional number for editorial additions like apparatus entries or notes on the text of the line.

+ +
Fig. 4: A TUSTEP file containing all of Jules Verne's novels.
+ +

It is also possible to have a text collection in one file, for example all novels written by Jules Verne, which are differentiated by the first number. In this way, the user would have easy access to each novel whilst having the possibility to search and edit all novels at once. Figure 4 shows the beginning of Jules Verne’s Reise zum Mittelpunkt der Erde which is in this collection the 51st text, having line numbers beginning with the number 51. An information window showing the number of sentences in the file, among other information, is available. All novels by Jules Verne consume about 80 MB of hard drive space.

+
+ +
+ What are the downsides of TUSTEP + +

Since the software originally dates from the 1960s and the 1970s where mouse and touch screen were not common or even invented and screens had little resolution, TUSTEP has a concept of usage that is not very intuitive to those who got to know modern graphical user interfaces first. It takes some time to get used to the TUSTEP way of using a computer program. The method of entering commands, via command-line and not having drop-down menus, is in itself useful – I believe one could enter commands faster using the keyboard instead of using a mouse –, but it forces the users to rethink their habits to operate an application.

+

As TUSTEP is a professional tool with numerous functionalities, it is not easy to use. The learning curve is quite steep at the beginning: Some weeks or even months are needed until working with TUSTEP takes up speed and one becomes familiar with all important functions. Furthermore, the technology used is TUSTEP-specific, thus the skills aquired cannot be reused in a different environment, unlike standards such as XQuery or XPath. On the other hand, within TUSTEP, once the user has learnt enough s/he can apply her or his knowledge to many projects, saving time. Also, with the skills to create, edit, analyze and typeset texts, one becomes less dependent on the skills of other persons.

+

Another downside is that there are some peculiarities in TUSTEP. For example, file names may be only up to 12 characters long (plus up to four characters in the file extension). If the user wants to use longer file names, s/he has to either rename the files or use a workaround by defining the longer file names as variables known to the session, in order to use the variable names instead of the file names. This and other peculiarities are part of the inheritance of the program’s long history, which makes them understandable: but perhaps it is time to rethink these parts of TUSTEP for the future.

+
+ +
+ Where to find help +

Due to the complexity of TUSTEP, it is recommended that beginners take an introductory course. The ITUG homepage informs about courses held at different universities in Germany, Switzerland and Austria. There is also the possibility to take a course within the scope of the annual ITUG conference.

+

Apart from that, there is a TUSTEP wikiCf. the TUSTEP wiki . where beginners and advanced users find tutorials, help for troubleshooting and a series of sample programs for common problems. This is a good starting point, but if the tips in the wiki are insufficient to help you out, one can ask questions on the TUSTEP mailing list.The information for subscription to the mailing list on the ITUG homepage is available at . In my experience, answers arrive within 24 hours and they normally solve the problem directly or give enough hints to find a solution. Although the TUSTEP community is quite small in comparison to other users software communities, I find it to be helpful, friendly and understanding; even the program authors will respond, particularly if the problem might indicate a bug.

+

As mentioned before, there is a handbook coming with each version of TUSTEP lying in the installation folder. The handbook offers a complete description of TUSTEP’s features, but its terminology is quite abstract and it takes some time to get used to it. The time is not wasted though, because it gives an exhaustive overview of the functionalities and the parameters. Unfortunately, there is no up-to-date introduction in the form of a monograph.The last book I am aware of is Peter Stahl TUSTEP für Einsteiger. It dates back to 1996 and has now become outdated.

+

Now, who uses TUSTEP? Over the decades many editorial projects have relied on the program. A list of ongoing and finished projects is available on the ITUG homepage;Cf. the project section on the ITUG homepage . while this list is by far not comprehensive, it gives an impression of the variety of scientific contexts TUSTEP is used in.

+
+ +
+ Conclusion +

TUSTEP is one of the oldest programs still in use in the Digital Humanities: it proves that software can survive more than just one decade. Many programs have come and gone in the time TUSTEP is around. Especially in the last few years so much software in the Digital Humanities has been developed and is in part already forgotten, because a new standard was introduced making the software obsolete, or a new version of an operating system was published and the software could not be run anymore, or a newer software was more powerful, so users had no need to use the older software.

+

The fact that TUSTEP has survived since the 1960s may be an indication that the software is still powerful and competitive, and that it successfully runs on different platforms. Indeed, TUSTEP files from the past on a magnetic tape would still be readable by TUSTEP 2018 ‒ provided that you had a device that could still read a magnetic tape. So one may state that TUSTEP itself is sustainable and the developers have always tried to make the application compatible from one version to another.

+

Even though some parts of TUSTEP have aged and could be renewed in the future, like the built-in text editor (that is one of 54 commands), in my opinion TUSTEP is worth being considered in new scientific projects dealing with texts or text editions. Because of the possibility to easily convert from and into different file formats, TUSTEP can also be used in combination with other software in a complementary and modular manner.

+
+
+
\ No newline at end of file