Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +
+Have you ever needed to collate two copies or editions of a historical document in order to quickly and reliably identify the differences between them? In digital scholarly editions, variant readings are often encoded manually by human editors. Tools for automatic text collation do exist, but most of them are either difficult to use, or they address specific use cases. As such, there is a need for easy-to-use, accessible, and adaptable tools which can handle textual variance and text alignment, including support for TEI-XML markup. In other words, some tools exist, but do they really fit the philological and technical requirements of researchers?
This article starts with a quick overview of text collation in philology and computer science. It goes on to develop a suitable review method and analyses three TEI-interoperable, web based collation tools: Juxta Web Service
This review focuses on
Text comparison means
Often used in authoring environments, text comparison generally follows functional purposes, such as when collaborative writing teams utilize it to revert unwanted or ill-intended modifications, in teaching and academic writing in order to check for plagiarism, or in code development to avoid conflicting changes. However, another use case of text comparison exists that aims at a better understanding of cultural processes. Scholars from all historical disciplines have an interest in retracing the creation and transmission of texts in order to establish connections or divergences between them. In textual criticism, the epistemic process of text comparison is called
The following two sections engage with divergent understandings of
Text collation is a technique used in
Traditionally, philological text collation is performed manually. One text is placed along the other, and an editor compares them word by word and letter by letter. Changes are catalogued in a
Editors need to establish clear guidelines to define the phenomena of interest for future users of an edition and those that will be explicitly disregarded during the collation. This is also an economic decision: in many cases, tracking of graphemic variance (e.g. æ/ä) or regular orthography (e.g. color/colour) requires immense additional resources. Even if such variants are potentially interesting for later studies, the required human effort is often not justifiable. Furthermore, the more details have to be tracked, the more human errors might increase.
There are various conventional methods to present the results of a given text collation in printed editions, while digitally based presentations are still in development. Independently from the media, classical scholarly editions prefer a
Digital tools can help. First, automatic collation can be significantly faster, if all witnesses are already fully transcribed. Philologists could focus their attention on semantics and leave the tedious letter-by-letter-collation to the machine. Or, argued from a positive perspective, the time and energy saved by automatic collation can be fully used for interpreting and refining the results.
Second, an automatic collation process would always produce the same result, thus is more reliable than a human produced collation. Furthermore, during the process of collating, humans naturally tend to produce individual interpretations and need to make assumptions on a writer’s or scribe’s possible intentions (Plachta 2013: 115–121). Instead, computers do not interpret, unless instructed to, but focus exclusively on comparing the texts. These two very different competences – precision on the one hand and interpretation on the other – could be utilized to divide the process into an automated collation layer and a human interpretation layer.
Third, digital tools can also help to create presentations dynamically, for example in graph based visualizations (see Fig. 3). Some digital environments even leave the choice of a visualization method and base-text to the user, and offer analytic tools to refine philological findings significantly beyond mere collation results (Rieder et al. 2012: 73–75). The creation of user-configurable visualizations is supported by a basic digital paradigm: separating the visual presentation (e.g. HTML) from the encoded representation (e.g. XML).
Analyzing differences between texts implies awareness about which type of variation can actually occur. Given two sequences A and B, the two basic cases of textual difference can be described as:
Two more complex categories of variation are:
Identifying substitution and transposition is less trivial than additions and deletions (for an analysis on the transposition problem, see Schöch 2016). Both substitution and transposition can also be interpreted as successive additions and deletions, and this ambiguity often makes it difficult to decide what the original intention actually was. It is a philologist’s task to produce a well-founded and weighted interpretation of a scribe’s possible intentions or error’s causes. Here lies a watershed between the different procedures for identifying substitutions or transpositions in philology and computer science: a textual scholar decides by personal knowledge and experience, an algorithm by formal models.
The first algorithms for text comparison were invented in the 1960s. The most popular example is called Levenshtein Distance (Levenshtein 1966) and expresses textual difference as ‘edit distance’, which is the calculus of the minimal number of characters that need to be added, deleted or substituted to transform sequence A into sequence B. In the 1970s, it was the Hunt-McIlroy-Algorithm which solved the longest common subsequence problem, and revolutionized file comparison when it was implemented for the Unix command ‘diff’ (Hunt and Mcllroy 1976), capable of identifying and describing differences line wise (not only character wise) in human and machine readable format. Larry Walls invented ‘patch’ in the 1980s, a Unix program capable of reverting changes by saving the difference in a separate file (“Patch” 2019). These algorithms or optimized variations of them have been in use until today.
Programming linguists and philologists have implemented collation tools since the 1960s. Collating long texts turned out to be a complex scenario, with some major developments around 1990 (Spadini 2016: 112–116). In 2009, a working group presented an abstract framework for complex text collation procedures, which was later called ‘Gothenburg model’
The following questions should be considered before starting the collation procedure:
On the one hand, the review aims at pointing out the individual strengths and specific qualities of each of the three presented tools. On the other hand, the review should also cover general aspects systematically. To this aim, a short list of basic features that helps to conduct this task, obligatory and non-obligatory, is presented first. All tools were tested with the same examples.
The list of requirements contains both
Although the selected tools were all intended to work as generic solutions, they have been developed from individual starting points or specific methodological perspectives on text analysis. Furthermore, there is a great variety of possible use cases – diversified by languages, epochs, genres, individual styles – which are too manifold to be adequately represented within the scope of this review. For this review, examples were chosen which are capable of demonstrating the general functionalities of the tools, while the suitability for specific use cases needs to be tested by the individual users themselves.
Two sets of texts have been used to test the tools:
Juxta (
Juxta Web Service requires an account which can be created online free of charge with a few clicks. The collation procedure follows three steps: First, the user uploads two or more files that he/she wishes to collate and that will appear as ‘sources’ in the dashboard. Juxta WS accepts a broad range of input formats, such as TXT, XML, HTML, DOC, RTF, PDF and EPUB. The documents can also be retrieved from a URL or pasted into a text field (and, as a special feature, it is even possible to refer to Wikipedia articles and their revisions). If desired, source files can be reformatted (there is also an XML indent function) and edited directly in the browser, and saved as a new source. Secondly, each source needs to be prepared as ‘witness’. This means that a distinct name needs to be assigned to each source, while the text is being tokenized automatically in the background. The whole process becomes transparent and can also be modified in the ‘XML View’, which displays the XSLT transformation templates. For example, Juxta WS omits all elements which are not on the TEI block level (e.g. highlights) by default, unless this behavior is changed. Finally, the user selects two or more witnesses to form a ‘comparison set’. For the collation process, the user can define how punctuation, character case, hyphenation and line breaks should be handled.
Juxta Web Service presents the result of the collation in a number of different views. The ‘Heat Map’ displays the base text with highlighted differences (see Fig. 4). The base text can be changed dynamically, and each single variant can be annotated manually. The ‘Side-by-side View’ is a synoptic view of two selected witnesses, with an optional ‘Histogram’ (see Fig. 5). Finally, the integrated Versioning Machine
The results can be exported in various formats, e.g. a TEI encoded version following the parallel segmentation method, and ready for further use (see Fig. 6). Also HTML or DOCX output is possible, including a classical apparatus which follows a simple output of the base text.
Juxta Web Service’s in-depth and thoughtfully developed features – although some of them remained in the experimental stage – make it a powerful tool. It offers a well-written documentation
LERA (
LERA can be tried online with a few sample texts. An individual instance with full functionality is available on personal request. After login, the user can upload two or more ‘documents’ for collation. During this procedure, the user assigns a siglum and a title to each document, and optionally sets language, segmentation method, hyphenation handling, and linguistic annotation. LERA works smoothly with XML, and all settings can be changed at different stages of the process. In a second phase, the user selects a set of documents for collation, which will then be displayed as ‘edition’.
LERA’s complex working environment offers a broad range of tools and features for text collation. The basic structure is a synoptic view of all documents, which can be customized with a rich selection of parameters and visual features. Additions, deletions, and substitutions can be color highlighted (see Fig. 7); alternatively, for collation of more than two texts, colors can be used to highlight variants in the texts which are identical in two versions (see Fig. 8) or exist only in one version. Detailed filter rules for normalization can be applied and changed on the fly. The most important distinctive feature of LERA is probably the section alignment, which is a feature to refine the results of the automatic collation of longer texts. Additionally, a navigation tool called CATview
LERA is an impressively coherent suite of tools for text alignment and collation which allows the user to flexibly combine tools and parameters for individual use cases. Two things are still on the wish list: TEI export (or another structured format like JSON) and a public code repository. The project is likely to be maintained, as it is essential for at least one ongoing large research project.
Variance Viewer (
Variance Viewer can be used without an account. The user can directly upload (exactly) two files to collate on a simple one page dialogue website. Accepted formats are only XML (by default) and TXT (as fallback). The user can upload a configuration file with customized settings (the GitHub documentation page
The web service operates quickly and displays a visualization of the collation result nearly instantly. The most distinctive feature of Variance Viewer is the automatic classification of variants. The tool identifies general content (by default), punctuation, graphemics, abbreviation, typography (depending on the TEI attribute
A unique feature of Variance Viewer is its ability to identify presentational differences, e.g. as typically described in
The result is downloadable in TEI/XML format, with variants encoded in parallel segmentation, using elements
Variance Viewer does an excellent job in handling the TEI input. The configuration options are powerful and make Variance Viewer an excellent generic tool. On the output side, it must be mentioned that the downloadable TEI document is not perfectly schema valid in case a variant occurs within an element that does not allow
According to the requirements, all tools provided a web interface for document upload (feature 1) and starting a collation procedure (feature 2), and all of them offer options for individual configurations. The tools’ approaches are very different in this respect: While LERA and Juxta Web Service both offer extremely granular interfaces, Variance Viewer offers high flexibility through an uploadable configuration file. Performance with large portions of text is adequate, but the tools cause heavy load on the client side, as they all load the complete collation result into the browser (instead of smaller portions).
All tools were able to identify additions, deletions and substitutions correctly (feature 3), while transposition is obviously an interpretative issue and needs further analysis or manual editing, as supported by LERA.
Furthermore, all tools offer a parallel synopsis view with variants highlighted (feature 4). Juxta Web Service and LERA both offer a helpful exploration tool for easy navigation through longer texts with Histogram and CATview. Concerning analysis, Variance Viewer has not only developed an interesting approach to classify variants automatically, but it can also detect presentational variants, which is most useful for collating texts with complex typography.
Concerning output formats (feature 5), there is still much to be achieved. Although a schema valid TEI output is available in Juxta Web Service and Variance Viewer, the methods used to structure collation results in XML are very diverse. In each, case it will be necessary to revise the TEI code and adopt it to one’s own practice. The same applies to other output formats, especially presentational formats, but none of the tools offers options to configure PDF and HTML, so that the usefulness of these routines is questionable.
It is positive that the source code of all tools is available (or planned to be) on public repositories (feature 6), so projects have a chance to review or reuse the code and to customize it for their own purposes. Usage of the tools is relatively easy and free of charge, as long as no special implementations are required (feature 7). Concerning accessibility, Variance Viewer follows an interesting lightweight concept, as it does not require any user management nor authentication, while LERA and Juxta Web Service require individual accounts and bind users to their web service.
The most debatable aspect of the TEI output routines is that all tools offer only
Concerning visualization and analysis, it should be mentioned that there are other tools which could cover this function independently. To give an example, TEICat
For the workflow of an edition, it is not only important to decide which tool suits the individual requirements, but also to decide the
Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +
+This review examines three tools:
It is being developed by the TELOTA (The Electronic Life Of The Academy)
+ initiative at the BBAW in Berlin by lead developers Stefan Dumont, Martin
+ Fechner, and Sascha Grabsch. After being used for projects at the BBAW
+ internally for many years, modules are released successively to the public
+ via a github repository
At a high level of methodological abstraction, one can ask what part of the
+ scientific, in this instance, the editorial process a tool attempts to
+ support the researcher. In the case of
Comparing
Looking at the work of developers in DH projects, instead of editors and
+ researchers, one can see another process that
As mentioned before, the tools are extensions of the pre-existing software
+ and thus obviously have a dependency on both. While
In addition to that, the
In terms of versioning, logging, and performance,
Due to its history as an internal tool of the BBAW,
Let us assume the following use-case. A project that already includes some
+ letters and indexes of places or persons. The editor now wants to add a new
+ transcription to the project. They will use the Data Source Explorer within
+
Further metadata can be added through the menus ‚Metadaten‘ and + ‚Briefmetadaten‘ in the toolbar directly above the editor window. These + correspond to the ‚fileDesc‘ and ‚profileDesc‘ information of the teiHeader. + For simple metadata fields, these function will prompt pop-up windows + prompting the editor to enter the relevant data which will then be written + into the file in the appropriate TEI Element in the correct position. For + some fields like e.g. the name of the author of the letter this will trigger + a call to the person index in the ediarum.DB. application which will be + displayed as a list of names. Selecting one, will insert both the name and + the corresponding xml:id into the document.
+The text of the letter can be inserted into the pre-existing structural + elements. Once the raw text has been copied into the file, the mark-up + functions can be used. These include different types of deletions and + additions, types of emphasis and comments. The indexes can also be used here + to markup entities in the text, by selecting them in the editor and then + selecting the corresponding entry from the index (Fig 3.).
+ +These indexes can also be edited from within Oxygen through the
+
The modules are available on individual public gitHub repositories. The
+
Looking at the learning curve of the tools, let’s consider first the
+ developer, who wants to install and/or customize
From the point of view of the ‘end user’, the non-DH researcher and editor,
+ navigation and use of the framework, once it is installed, is largely
+ self-explanatory and intuitive. All specific additional functionality is
+ available through the
Due to the nature of both
The source code is openly available on github
As the code is complex, it will still require some studying and analysis but + a developer fluent in the relevant languages (X-technologies, CSS, + potentially also JAVA) will know where to inject custom functionalities and + how to integrate it with existing functions. Again, it has to be noted that + the developers are continually working on the modules themselves and github + presents a platform for anyone wishing to extend and adapt the code. Judging + by recent presentations (Dumont and Fechner 2019), the three modules that + have so far been the focus of this review will be accompanied in the future + by further modules that will provide functionality related to publication in + web and print.
+Both Author mode frameworks employ the GUI of the
In the current version, all text elements of the GUI are exclusively German.
+ Due to the origin of
The module
Recalling the stated intention of
While forms for data input and even basic transcription/markup tools are
+ nothing new, the key advantage and contribution is the fact this interface
+ is integrated as a layer on top of
What is holding back
Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +
+Le CMS (Content Management System) open-source Omeka
Pour comprendre le fonctionnement d’
Si nous appliquons cette distinction à
Cependant, face aux besoins croissants de la communauté et l’explosion du
+ nombre de projets
Dans cette revue, nous proposons donc une présentation de la manière dont ces
+ différentes activités peuvent être réalisées avec
+
La diversité des institutions utilisant
Parmi ces étapes, la plus populaire et la plus répandue au sein de la
+ communauté
Au cours de la dernière décennie, plusieurs entreprises ont vu le jour
+ pour faire d’
Nous avons repéré cinq plugins permettant de gérer la TEI dans
+
Bien qu’
Les plugins développés pour
Les différents plugins que nous allons présenter ont chacun adopté une
+ solution différente pour insérer des éditions numériques dans ce circuit
+ adapté aux éditions numérisées. Ils considèrent les éditions numériques
+ soit comme des extensions d’un item préexistant (
Le plugin
+
Ces projets reposent sur le même fonctionnement: l’utilisateur crée
+ un compte, sélectionne un contenu à transcrire et accède à une
+ interface de transcription qui met en vis-à-vis la numérisation d’un
+ contenu imprimé ou manuscrit et un champ de texte libre, où
+ l’utilisateur reproduit le texte tel qu’il le voit sur la
+ numérisation (Figures 1 et 2). Contrairement à d’autres projets,
+ tels que
Bien que
Une autre manière d’employer
Ce plugin a notamment été employé par le projet
Les transcriptions interprétatives des lettres se trouvent en-dessous
+ des numérisations et apparaissent comme un appui à l’édition
+ numérisée (Fig. 3). Chaque transcription peut être exportée en
+ XML-TEI. Le projet met également à disposition, en téléchargement,
+ le corpus entier encodé en XML-TEI, ainsi que le schéma utilisé pour
+ les besoins du projet sous sa forme ODD (One Document Does it
+ all)
Le plugin
Ce projet propose en vis-à-vis l’édition numérisée et l’édition
+ numérique interprétative (Fig. 4). Les noms de lieux, de personnes
+ et les dates sont mis en évidence et renvoient vers une définition
+ plus précise, qui elle-même propose une liste de tous les documents
+ où la notion définie est citée. Le fichier .pdf de l’édition
+ numérisée et le fichier .xml de l’édition numérique en TEI peuvent
+ être téléchargés par les utilisateurs. Comme pour le projet
+
Le plugin
À ce jour, nous ne connaissons qu’un projet ayant recours au plugin
+
Les différents projets d’éditions numériques que nous avons présentés
+ dans cette partie se caractérisent tous par la présence de nombreux
+ outils complémentaires, tels que des index, des bibliographies ou
+ des expositions virtuelles. Ces outils mettent en lumière l’un des
+ principaux intérêts d’
La solution adoptée par les plugins
Bien que les recommandations des développeurs d’
L’usage d’un éditeur XML-TEI accompagné d’une boîte à outils a déjà
+ été expérimenté dans le cadre de projets non-
Le plugin
Comme pour le plugin
+
Compatible avec les systèmes Linux, Mac OS X et Windows, pour fonctionner,
+
L’installation d’
L’installation des plugins est également détaillée dans le manuel
+ utilisateur. Les plugins officiels bénéficient d’une documentation qui leur
+ est propre sur le site officiel et sur
Bien que documenté et facile à déployer, la principale limite
+ d’
Dans le cadre de projets de diffusion de contenus, qui n’emploient que les
+ plugins et les thèmes officiels, sans apporter de lourdes modifications au
+ code source du CMS, la maintenance n’est pas un problème épineux. Lors d’une
+ mise à jour du CMS, les plugins de la liste officielle sont mis à jour
+ progressivement par l’équipe de développement d’
Cependant, dans le cadre de projets plus complexes, qui souhaitent un design
+ élaboré et mettre en place des fonctionnalités qui dépassent ce pour quoi
+
Or, la très grande majorité des plugins
Face à ce problème, l’une des réponses de la communauté a été de mettre à + disposition des autres utilisateurs les plugins préexistants qu’elle a + modifié pour élaborer un autre plugin. Par conséquent, dans la liste des + plugins non-officiels, il est possible de croiser des plugins officiels qui + ont été modifiés par un projet dans le cadre de l’élaboration d’un autre + plugin. Pour que cet autre plugin soit employé par un projet tiers, ce + dernier sera, dans certains cas, contraints d’installer également les + plugins modifiés pour élaborer ce nouveau plugin.
+Ce problème de compatibilité et de maintenance des plugins constitue l’une
+ des principales limites d’
+
Prenons l’exemple des plugins. Comme nous l’avons vu précédemment,
+ l’installation des plugins est semblable à celle d’
L’accès à l’interface privée d’
En termes d’accessibilité
Choisir
Bien qu’il n’y soit pas destiné, le recours à
Bien que la création ou la diffusion d’éditions numériques en XML-TEI
+ représentent toujours un défi pour les projets et qu’aucun consensus n’a été
+ trouvé autour d’un plugin, ces différents éléments, constitutifs
+ d’
Actuellement,
Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +
+It is questionable whether anyone is happy with the traditional format of the critical printed edition. The critical apparatus was designed around the constraints of typesetting in the eighteenth century, and leaves much to be desired as a method of visualizing textual variation. Researchers rely daily on being able to search primary sources, but most public corpora are based on editions from the nineteenth century, since few series of critical editions make their texts openly available in digital form. Nonetheless, many academics view printed books as the most reasonable method of publishing critical editions in light of concerns over digital publications’ stability, readability, and authority. If we wish to encourage more scholars to start editing texts in ways that exploit the computing resources available to us, we need to provide ways to produce editions that give the best possible presentations of a text in both digital and print forms – editions that are designed for humans to read as well as our machines.
There is not yet a reusable solution to editing texts in such a technology-independent form, but one of the key pieces in this puzzle has existed for decades: software for automatically typesetting texts with a scholarly apparatus. Maïeul Rouquette’s Reledmac, a package for the venerable LaTeX typesetting system, has emerged as a readily available program that, used correctly, can produce professional results. A non-comprehensive bibliography lists almost seventy publications that have used it with a multitude of languages, including contributions from leading scholars (Wujastyk and Rouquette 2013). Although the package’s gestation over more than three decades has resulted in some quirks, the wide support for LaTeX and its portability to nearly any system has made Reledmac’s adoption possible by several digital editing projects, showing its potential as a key piece of scholarly infrastructure.
This review addresses Reledmac
Reledmac is a mature tool with a long gestation; to understand its advantages, success, and idiosyncrasies, one needs to consider it within the development of the TeX typesetting system, which is sometimes called ‘plain TeX’ to distinguish it from the typesetting systems based on it such as LaTeX. Donald Knuth, a computer scientist and mathematician, first released this program in 1978 in reaction to the poor quality of early digitally typeset books (Knuth 1986 is his comprehensive guide). This program quickly became popular in academic circles, and is particularly respected for the Knuth-Plass line breaking algorithm (Knuth and Plass 1981), which composes text as a paragraph to minimize hyphenation and other typesetting problems. By contrast, standard word processors and Web browsers still compose text on a line-by-line basis, which produces a distinctly mechanical feel and reduces reading comprehension.
Both medieval scribes and early printers aimed for evenness in what designers now call ‘type colour’, producing text blocks that look almost grey when one squints at them, without distracting gaps or abrupt combinations of heavy and light text. The industrialization of typesetting beginning in the nineteenth century gradually eroded this principle. The developers of phototypeset books, beginning in the 1950s, almost completely ignored it. This underlies much of the poor quality of a digitally produced book from the 1970s in comparison to its equivalent from earlier centuries.
The celebrated typographer Hermann Zapf further advanced the state of automated typesetting with his Hz-program, introducing what is sometimes called ‘microtypography’. This software implemented the techniques of scribes and the first type compositors to produce consistent type colour by using narrower or wider versions of characters (Zapf 1993; Bringhurst 2013). This was far more laborious to produce in print than in manuscript, and quickly fell by the wayside, but this does not make the underlying concept less useful. It has been used for many recent books after Adobe Systems acquired Zapf’s software for integration in its page layout software, InDesign. The principles have also been implemented for TeX, and can be easily included in any document using the Microtype package (Schlicht 2004–2019). It is this approach to paragraph composition that allows the TeX family of software to produce such excellent results.
Few people now use TeX directly, instead using a derivative such that facilitates a structured approach to typesetting. Leslie Lamport released LaTeX in 1983, abstracting much of TeX behind macros centred on the organization of a document. For example, it provides a
These additions are both the greatest strength and weakness of LaTeX. Individual volunteers write most packages, which provide additional TeX programming that exists on the same level as LaTeX itself. They can produce unexpected results when combined. The creators of packages often cease to maintain them, and old packages are rarely pruned from CTAN, the standard repository for TeX-related software, leaving traps for users who might accidentally use one of these packages after finding an old reference to it. This situation has been remedied in part through the creation of KOMA-Script (Kohm 1994–2019) and Memoir (Wilson and Madsen 2001–2018), providing versatile and carefully conceived classes that eliminate the need for many packages and provide their own reference manuals covering most aspects of LaTeX. It is such packages, combined with the generous community of users that maintain online help forums, that has sustained LaTeX over so many years in spite of many shortcomings in its design.
The final complication of using TeX is the series of different
An attempt to streamline this complex situation exists in ConTeXt, a more recent abstraction of TeX independent of LaTeX. In spite of its many improvements, it has yet to gain comparable traction because it lacks the ready-made classes and packages that allow one to quickly produce good results with LaTeX, as long as one is working within its paradigm. LaTeX is truly a reflection of humanity, showing the beauty that collective generosity can produce, but also the confusion that results from a lack of coordination.
It is within this web of packages and different interfaces for TeX that Reledmac exists, and its history defines both its strengths and limitations. Reledmac originates in Edmac (short for ‘editing macros’), which John Lavagnino (a Shakespearean and systems manager then at Brandeis University, now at King’s College London) and Dominik Wujastyk (a Sanskrit scholar, then at the Wellcome Institute for the History of Medicine, now at the University of Alberta) designed in 1987–89, developing it in their spare time to support their own editing work (Lavagnino and Wujastyk 1990; Wujastyk 1993). LaTeX had not yet become widespread, and they designed the package to interact directly with plain TeX without taking LaTeX functionality into account. Beginning in 1994, Peter Wilson (a specialist in information modelling) ported the package to LaTeX as a pure labour of love, renaming it Ledmac (Walden 2006). Wilson was responsible for a large number of LaTeX packages, leaving a maintenance gap on his retirement that took several people to fill (Robertson 2009).
Maïeul Rouquette, a scholar of early Christianity at the University of Lausanne, took over Ledmac in 2011 when he was using it to write his doctoral thesis (Rouquette 2017), renaming it first Eledmac and then Reledmac, which allowed him to revise the interface and functionality without affecting projects that used older versions. He has since continued to improve the package’s functionality beyond the scope of his own research. Rouquette has also put significant energy into writing thorough documentation, alongside a general introduction to LaTeX for humanists that discusses Reledmac alongside Reledpar, its sister package for setting parallel texts (Rouquette 2012). The project’s GitHub repository lists fourteen other minor contributors. The culture of open-source software created out of goodwill for a practical end without explicit funding is typical for LaTeX packages. This haphazard model often produces useful results, but it is not clear that it is sustainable, especially as the employment of early-career researchers becomes increasingly unstable.
The legacy of the original Edmac and the process of its transition to LaTeX remains evident in the package as it now exists. Wilson had only the goal of making the package functional, and did not rewrite it to use the logic of LaTeX. As a result, the package must emulate the functionality of many basic LaTeX macros such as headings and block quotations rather than use them directly, and they often do not behave in the way one expects. For instance, although KOMA-Script and Wilson’s own Memoir class include environments for setting verse, they give unexpected results in Reledmac, and one instead needs to use its internal mechanism. One needs to treat Reledmac almost as a separate system from LaTeX, and the package would need to be rewritten to resolve this situation. The Ednotes package began this effort (Lück 2003), but it never reached equal functionality and development ceased in 2006. This situation is not the fault of the package’s authors, but it increases the challenge of converting text for typesetting in LaTeX with Reledmac, as well as the learning curve.
Once one understands Reledmac’s limitations, and its methodological focus on visualizing textual variants using traditional mechanisms developed for print, its interface is nearly as simple as one can achieve. A critical edition involves a complex dataset, and the LaTeX format imposes further constraints similar to those of the XML format underlying TEI. The software works from encoding for critical notes that focuses on typography rather than semantics, running its own TeX code to arrange notes and line numbers according to LaTeX’s positioning of the text. This is a basic document with critical notes (see also Fig. 2):
\usepackage{microtype} % improves justification
\usepackage[pdfusetitle,hidelinks]{hyperref} % adds links from apparatus to text
\usepackage[series={A,B}]{reledmac} % enables two levels of apparatus
\title{Sample Edition}
\author{Andrew Dunning}
\begin{document} % begin LaTeX document
\maketitle
\chapter{Introduction}
Introductory text.
\chapter{Edition}
% text outside \beginnumbering … \endnumbering works as normal LaTeX
\beginnumbering % begins Reledmac numbered section
\pstart % begin a paragraph in Reledmac; or use the \autopar command
This is a \edtext{test}{\Afootnote{experimental \emph{L}}}
\edtext{sentence}{\Bfootnote{Introduced to English via Old French
from Latin \emph{sententia} 'opinion'.}}.
\pend % end a paragraph in Reledmac
\endnumbering % end Reledmac numbered section
\end{document} % end LaTeX document
Files using Reledmac can be rendered using any LaTeX engine. It results in slightly longer compilation times than normal, because it needs to generate extra temporary files. The example above enables two series of critical notes with Reledmac. (One can instead use standard numbered footnotes or endnotes.) The
Reledmac demonstrates a few minor shortcomings in facilitating features of high-quality editions, though there are usually ways to achieve the desired results by hand. When making a note on a long passage, most editors will refer only to its first and last words. In Reledmac, this requires the
\Bfootnote{The opening line of Virgil, \emph{Aeneid.}}}
This must be done by hand for every note that does not quote the full lemma. In some cases, this is advantageous. For commentaries in particular, the ability to write one’s own lemma to focus on the precise passage in question is a great help. On the other hand, it would be a great service if Reledmac could borrow Classical Text Editor’s options for setting document-wide styles to automatically process lemmata by truncating a phrase to the first and last words; removing punctuation and other specified characters; making the text lowercase; and transliterating text as appropriate, for example from V to u in Latin. Similarly, it would be useful to have an option to abbreviate number ranges automatically (e.g. changing ‘107–108’ to ‘107–8’). These, however, are among the few obvious examples of missing functionality in the package.
Reledmac also includes a powerful cross referencing system, allowing one to add references to page and line numbers and have them remain accurate through any changes to the document (see also Fig. 4):
\usepackage{microtype}
\usepackage[pdfusetitle,hidelinks]{hyperref}
\usepackage[series={A,B}]{reledmac}
% Add labels to cross references
\setapprefprefixsingle{line }
\setapprefprefixmore{lines }
\setSErefprefixsingle{line }
\setSErefprefixmore{lines }
\setSErefonlypageprefixsingle{p.~}
\setSErefonlypageprefixmore{pp.~}
\title{Sample Edition}
\author{Andrew Dunning}
\begin{document}
\maketitle
\chapter{Introduction}
Introductory text: see \SEref{sentence} and note to \appref{test}.
\chapter{Edition}
\beginnumbering
\pstart
\edlabelS{sentence}This is a \edtext{test}{\applabel{test}
\Afootnote{experimental \emph{L}}} \edtext{sentence}{\Bfootnote{Introduced to English via Old French from Latin \emph{sententia} `opinion'.}}.\edlabelE{sentence}
\pend
\endnumbering
\end{document}
Reledmac has several commands for creating cross references, but most users will only need two. The
LaTeX syntax is less verbose than XML, and I have known several colleagues who have found it initially much easier to understand than TEI. Over the long term, however, writing an edition in TEI rather than directly in LaTeX is more sustainable, even if it is intended purely for print publication. From a practical perspective, XML validation allows one to find errors more quickly: a missing bracket can cause LaTeX to fall over itself in reporting obtuse error messages through its logs, which themselves are more difficult to read than necessary. Reledmac is focused purely on typesetting, making it difficult to develop mechanical checks for one’s editorial work. TEI’s focus on semantic markup is highly useful in this respect, and a number of researchers have taken advantage of this on a project-level basis. It is crucial that the TEI community seize this opportunity if it wishes to be viewed as a serious publishing option.
There are a number of scripts available for typesetting TEI editions with LaTeX and Reledmac, most of them developed to fit the needs of specific projects. The earliest of these is part of the TEI Consortium’s official stylesheets (Rahtz et al. 2011–2019). These stylesheets do not render text following any scholarly convention for a printed critical edition, and are complex to modify. As a result, implementations for individual projects are usually written from scratch (e.g. Witt 2018; Camps 2017; McLean 2015–2016). None yet offer a general-purpose tool that renders TEI elements into the form one would normally expect for printed editions of premodern texts.
Marjorie Burghart’s TEI Critical Apparatus Toolbox
Such attempts are achievable because of the wide support for integrating LaTeX into other environments and its portability. Any full LaTeX distribution
At the same time, LaTeX has a number of oddities that can make transformation from XML somewhat complex. For example, there is no standard mechanism for changing the language, as there are two mutually incompatible packages for achieving this (Babel
Given this history of software cobbled together by a series of programmers, humanists, and non-specialists in spare time over three decades, it is a small miracle that LaTeX with Reledmac is not merely functional but has become the most reliable method of automatic typesetting for critical editions. It is to be hoped that one day the editing community will band together to give the project more support and ensure its sustainability, for it is clear that Rouquette could create a much more functional package if he had the time, resources, and desire to redesign it from the ground up. Both using the package directly and typesetting critical editions from TEI XML would be much more straightforward with a package designed from the outset to work with LaTeX. Alternatively, there might be more promise in creating a critical editing module for ConTeXt, a rationalized competitor to LaTeX that has a focus on typesetting XML directly without the need to first transform it into a different markup language. There have been some forays down this path (Hamid 2007), but nothing has yet seen the light of day.
In the small field of software for critical editing, Reledmac fills a helpful niche alongside the more complex TUSTEP (Ott 1979; Schälkle and Ott 2018) and the commercial Classical Text Editor (Hagel 2007), focusing on providing a key element of a publishing workflow rather than an all-encompassing editing environment. Its interface is as user-friendly as one can achieve in LaTeX code; its clear documentation and examples mean that one can reasonably expect to learn it oneself; and it can produce documents of the highest quality. One can hardly ask for more, and our community is indebted to Rouquette and his predecessors for putting so much of their energy into the basic digital infrastructure for the humanities that often goes unacknowledged.
Auf der Basis von + http://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0 +
+TUSTEP is a software toolbox or environment designed for the Digital Humanities. It has been under constant development since 1966, bearing the name of TUSTEP since 1978. It was located at the Zentrum für Datenverarbeitung (ZDV) of the University of Tübingen until 2003 and is since then the responsibility of the International TUSTEP User Group (ITUG).
The program authors, Wilhelm Ott and Kuno Schälkle, still update and develop the program, issuing a new version approximately each year. TUSTEP 2018 is the latest stable version for production at the moment.
+This version and older versions may be downloaded, installed and used for free. They are compatible with Windows, Linux and Mac operating systems and I found the installation to be straightforward.
While TUSTEP is normally installed on a personal computer, it is also possible to run it on a Linux Server and there have even been successful experiments to run the software on Raspberry Pis.
Being a software toolbox, TUSTEP does not have a single purpose, functionality or area of application ‒ except that it is designed to work with texts. As will be explained in more detail in the next section, TUSTEP consists of a variety of modules, each of which being designed for its specific task. Those modules can be combined almost freely by the user to create very complex workflows, so that the program suits the users’ individual needs. With regard to digital scholarly editions, TUSTEP supports all steps of a typical editorial workflow: From transcription to collation, from the constitution of critical texts with up to nine apparatus levels to the creation of indices and concordances, from professional typesetting to converting a text into any desired plain text file format.
+As said above, TUSTEP consists of individual modules also referred to as commands. There are currently 54 commands in TUSTEP. Each has a German and an alternative English alias that consists of a leading "#", the command name and a variable number of parameters.
+ For example, there is a command for copying the contents of one file into another. The name of this command is
#KOPIERE,
+
quelle = file_a,
+
ziel = file_b,
+
modus = +,
+
loeschen = +
+
The order of the parameters can be altered if one wishes to, but there is a standard order. TUSTEP comes with a command-line based user interface which is described in more detail below. The user tells TUSTEP the commands to be executed by typing them into the command line. In order to save a lot of typing, any command may be abbreviated after the second letter at the latest and all parameters names may be omitted if the standard order is observed. The above command may also look like this:
+
+
#KO, file_a, file_b, +, +
+
Both notations do the same thing: Copy the contents of "file_a" into "file_b", recount the line numbers and overwrite the former contents of "file_b". It is possible to combine the short hand notation with the full notation.
+One benefit of TUSTEP lies in the extensive functionality of its commands. While copying a file, the user may simultaneously want to manipulate the contents of that file. Let’s suppose, there is a certain XML or SGML tagset in "file_a" that should be converted to TEI-XML. The command accepts additional parameters (here between "*" and "*eof" ‒ eof standing for "end of file"):
+
+
#KOPIERE, file_a, file_b, +, +, parameter = *
+
XX |<Name>|<persName>|
+
XX |</Name>|</persName>|
+
*eof
+
The parameter "XX" exchanges strings with other strings. It also accepts string patterns, so it is possible to compress these two lines of parameters into one by making the backslash optional.
Aside from
Automatically creating indices with
Finally, one of the most important features in TUSTEP is
While it is nice to have a separate command for each purpose, research projects and especially editorial enterprises need more complex workflows, in which many steps have to be executed successively. So it would be very uncomfortable to type in the commands every time. In TUSTEP, the user can therefore create her/his own workflows by writing the commands needed into a temporary or permanent file and execute this file over and over again. This gives the ability to the user to create a customized program. It is also possible to execute such a file with variables to adapt the program at each execution. The user may furthermore save the interim results or create protocols for every step to analyze them later. A sample program may look like this (I have omitted all specifications just to show the concept). My comments are behind the two slashes:
+ +TUSTEP is also equipped with a modern scripting language called TUSCRIPT. After writing TUSTEP-commands into a file, it is possible to control the workflow with structures from TUSCRIPT, like if-then-queries, loops and a variety of predefined functions. The scripting language allows to largely automatize the workflows.
+But what is more important: TUSTEP gives to the user the possibility to create new and more complex tools (i. e. new programs) out of the pieces of software in the toolbox. Hence the true power of TUSTEP does not lie in the functionality of its single components, but in their flexibility and the possibility to combine them as needed to find customized solutions for specific and new problems. It is a major feature of TUSTEP that the user is able to import and export their data or edition materials at any time from and into any plain text format with any desired markup (for example custom XML or TEI-XML). Hence, the results of a TUSTEP based workflow can be processed further in other applications, and vice versa results from other applications can be processed with TUSTEP.
+
+ While many applications work with a specific markup, for example XML-tags, TUSTEP does not need a fixed markup, except when it comes to typesetting. When typesetting a text with
When starting the program for the first time, the screen leads the user to a menu (see Fig. 1) for creating a "Sitzung" (session). A session ‒ one may think of as the project’s default directory or root directory on the user hard drive ‒ is a security concept and one that forces to keep order in the files. All files of a project have to be in this directory (or in sub-directories) to enable access to a TUSTEP session.
Once TUSTEP is open with a session, the starting screen (see Fig. 2) pops up showing the command line at the bottom where the user can enter commands directly. In the blue field above, TUSTEP gives information about the program version and feedback on the commands entered. This includes error messages and notifications of success.
+ + + +If the user types the command
The line numbers can also be used to imitate simple text structures. Three levels of line numbers are accepted, such as 200.098/54. This number can be interpreted as page number 200 and line number 98 referring to a printed book, with a third additional number for editorial additions like apparatus entries or notes on the text of the line.
+ + + +It is also possible to have a text collection in one file, for example all novels written by Jules Verne, which are differentiated by the first number. In this way, the user would have easy access to each novel whilst having the possibility to search and edit all novels at once. Figure 4 shows the beginning of Jules Verne’s
Since the software originally dates from the 1960s and the 1970s where mouse and touch screen were not common or even invented and screens had little resolution, TUSTEP has a concept of usage that is not very intuitive to those who got to know modern graphical user interfaces first. It takes some time to get used to the TUSTEP way of using a computer program. The method of entering commands, via command-line and not having drop-down menus, is in itself useful – I believe one could enter commands faster using the keyboard instead of using a mouse –, but it forces the users to rethink their habits to operate an application.
+As TUSTEP is a professional tool with numerous functionalities, it is not easy to use. The learning curve is quite steep at the beginning: Some weeks or even months are needed until working with TUSTEP takes up speed and one becomes familiar with all important functions. Furthermore, the technology used is TUSTEP-specific, thus the skills aquired cannot be reused in a different environment, unlike standards such as XQuery or XPath. On the other hand, within TUSTEP, once the user has learnt enough s/he can apply her or his knowledge to many projects, saving time. Also, with the skills to create, edit, analyze and typeset texts, one becomes less dependent on the skills of other persons.
+Another downside is that there are some peculiarities in TUSTEP. For example, file names may be only up to 12 characters long (plus up to four characters in the file extension). If the user wants to use longer file names, s/he has to either rename the files or use a workaround by defining the longer file names as variables known to the session, in order to use the variable names instead of the file names. This and other peculiarities are part of the inheritance of the program’s long history, which makes them understandable: but perhaps it is time to rethink these parts of TUSTEP for the future.
+Due to the complexity of TUSTEP, it is recommended that beginners take an introductory course. The ITUG homepage informs about courses held at different universities in Germany, Switzerland and Austria. There is also the possibility to take a course within the scope of the annual ITUG conference.
+Apart from that, there is a TUSTEP wiki
As mentioned before, there is a handbook coming with each version of TUSTEP lying in the installation folder. The handbook offers a complete description of TUSTEP’s features, but its terminology is quite abstract and it takes some time to get used to it. The time is not wasted though, because it gives an exhaustive overview of the functionalities and the parameters. Unfortunately, there is no up-to-date introduction in the form of a monograph.
Now, who uses TUSTEP? Over the decades many editorial projects have relied on the program. A list of ongoing and finished projects is available on the ITUG homepage;
TUSTEP is one of the oldest programs still in use in the Digital Humanities: it proves that software can survive more than just one decade. Many programs have come and gone in the time TUSTEP is around. Especially in the last few years so much software in the Digital Humanities has been developed and is in part already forgotten, because a new standard was introduced making the software obsolete, or a new version of an operating system was published and the software could not be run anymore, or a newer software was more powerful, so users had no need to use the older software.
+The fact that TUSTEP has survived since the 1960s may be an indication that the software is still powerful and competitive, and that it successfully runs on different platforms. Indeed, TUSTEP files from the past on a magnetic tape would still be readable by TUSTEP 2018 ‒ provided that you had a device that could still read a magnetic tape. So one may state that TUSTEP itself is sustainable and the developers have always tried to make the application compatible from one version to another.
+Even though some parts of TUSTEP have aged and could be renewed in the future, like the built-in text editor (that is one of 54 commands), in my opinion TUSTEP is worth being considered in new scientific projects dealing with texts or text editions. Because of the possibility to easily convert from and into different file formats, TUSTEP can also be used in combination with other software in a complementary and modular manner.
+