Skip to content

Commit

Permalink
manually merged software list
Browse files Browse the repository at this point in the history
  • Loading branch information
daniel-jettka committed Feb 22, 2024
2 parents 6229749 + 5c0d0f6 commit a16bfc3
Show file tree
Hide file tree
Showing 12 changed files with 550 additions and 272 deletions.
47 changes: 25 additions & 22 deletions data/JTEI/14_2021-23/jtei-burnard-shoch-odebrecht-194-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -200,15 +200,17 @@
projects, though the TEI Consortium website has for many years offered a platform
for one: <title level="a">Projects Using the TEI,</title> accessed May 17, 2021,
<ptr target="https://tei-c.org/activities/projects/"/>. More recently, the
TEIhub project lists more than 12,500 <ptr type="software" xml:id="GitHub"
target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>-hosted TEI
projects (last updated May 11, 2021, <ptr target="https://teihub.netlify.app/"/>);
an associated bot called TEI Pelican provides a daily twitter feed of new <ptr
type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name"
ref="#GitHub">GitHub</rs> repositories containing a TEI header. We are unaware
of any systematic analysis of the application types indicated by these data
sources, but a glance gives the impression that traditional editorial and
resource-building projects predominate.</note>
TEIhub project lists more than 12,500 <ptr type="software" xml:id="R1"
target="#GitHub"/><rs type="soft.name" ref="#R1">GitHub</rs>-hosted TEI
projects (last updated May 11, 2021, <ptr type="software" xml:id="R7"
target="#teipelican"/><rs type="soft.url" ref="#R7"><ptr
target="https://teihub.netlify.app/"/></rs>); an associated bot called <rs
type="soft.url" ref="#R7">TEI Pelican</rs> provides a daily twitter feed of new
<ptr type="software" xml:id="R2" target="#GitHub"/><rs type="soft.name"
ref="#R2">GitHub</rs> repositories containing a TEI header. We are unaware of
any systematic analysis of the application types indicated by these data sources,
but a glance gives the impression that traditional editorial and resource-building
projects predominate.</note>
</p>
<p>The work of the Action<note>Further information about the Action is available from
its website at <ptr target="https://www.distant-reading.net/"/>. For information
Expand All @@ -228,9 +230,8 @@
issues of sampling and balance were prepared for discussion and approval by the
members of WG1, and remain available from the Working Group’s website. <note>These
and other documents are available from the Action’s <ptr type="software"
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub"
>GitHub</rs> page, accessed May 17, 2021, <ptr
target="https://distantreading.github.io/"/>.</note>
xml:id="R3" target="#GitHub"/><rs type="soft.name" ref="#R3">GitHub</rs> page,
accessed May 17, 2021, <ptr target="https://distantreading.github.io/"/>.</note>
</p>
</div>
<div xml:id="eltec">
Expand Down Expand Up @@ -653,7 +654,8 @@
<p>In the ELTeC project, we begin by defining an ODD which selects from TEI all the
components used by any ELTeC schema at any level. This ODD also contains
documentation and specifies usage constraints applicable across every schema. This
base ODD is then processed using the TEI standard odd2odd stylesheet to produce a
base ODD is then processed using the <ptr type="software" xml:id="R8"
target="#odd2odd"/><rs type="soft.name" ref="#R8">TEI standard odd2odd stylesheet</rs> to produce a
stand-alone set of TEI specifications which we call eltec-library. Three different
ODDs, eltec-0, eltec-1, and eltec-2, then derive specific schemas and documentation
for each of the three ELTeC levels, using this library of specifications as a base
Expand All @@ -662,13 +664,14 @@
resulting encoding standard. As with other ODDs, we are then able to produce
documentation and formal schemas which reflect exactly the scope of each encoding
level.</p>
<p>The ODD sources and their outputs are maintained on <ptr type="software"
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>
and are also <ptr target="http://doi.org/10.5281/zenodo.3546326"/>published on Zenodo
(<ref type="bibl" target="#odebrecht2019">Odebrecht et al. 2019</ref>) along with
the ELTeC corpora.<note>The <ptr type="software" xml:id="GitHub" target="#GitHub"
/><rs type="soft.name" ref="#GitHub">GitHub</rs> repository for the ELTeC
collection (last updated May 17, 2021) is found at <ptr
<p>The ODD sources and their outputs are maintained on <ptr type="software" xml:id="R4"
target="#GitHub"/><rs type="soft.name" ref="#R4">GitHub</rs> and are also <ptr
target="http://doi.org/10.5281/zenodo.3546326"/>published on <ptr type="software" xml:id="R9"
target="#zenodo"/><rs type="soft.name" ref="#R9">Zenodo</rs> (<ref
type="bibl" target="#odebrecht2019">Odebrecht et al. 2019</ref>) along with the
ELTeC corpora.<note>The <ptr type="software" xml:id="R5" target="#GitHub"/><rs
type="soft.name" ref="#R5">GitHub</rs> repository for the ELTeC collection
(last updated May 17, 2021) is found at <ptr
target="https://github.com/COST-ELTeC/"/>; the Zenodo community within which it
is being published (last updated April 11, 2021) lives at <ptr
target="https://zenodo.org/communities/eltec/"/>.</note>
Expand All @@ -689,8 +692,8 @@
development and are expected to become available during the coming year. As noted
above, up-to-date information about the current state of all corpora is publicly
visible at <ptr target="http://distantreading.github.io/ELTeC/"/>, which includes
links to the individual <ptr type="software" xml:id="GitHub" target="#GitHub"/><rs
type="soft.name" ref="#GitHub">GitHub</rs> repositories for each corpus.</p>
links to the individual <ptr type="software" xml:id="R6" target="#GitHub"/><rs
type="soft.name" ref="#R6">GitHub</rs> repositories for each corpus.</p>
<p>As well as continuing to expand the collection, and continuing to fine-tune its
composition, we hope to improve the consistency and reliability of the metadata
associated with each text, as far as possible automatically. For example, we have
Expand Down
95 changes: 54 additions & 41 deletions data/JTEI/14_2021-23/jtei-cc-pn-erjavec-195-source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -207,10 +207,11 @@
<div xml:id="schema">
<head>The Parla-CLARIN Schema</head>
<p>Parla-CLARIN is written as a TEI ODD document, consisting of the prose guidelines and
the schema specification, on the basis of which it is possible, using the standard TEI
XSLT stylesheets, to derive an XML schema expressed either as a RelaxNG schema, a DTD,
or a W3C schema, which is then used for formal validations of a Parla-CLARIN
parliamentary corpus.</p>
the schema specification, on the basis of which it is possible, using the <ptr
type="software" xml:id="R5" target="#teistylesheets"/><rs type="soft.name" ref="#R5"
>standard TEI XSLT stylesheets</rs>, to derive an XML schema expressed either as a
RelaxNG schema, a DTD, or a W3C schema, which is then used for formal validations of a
Parla-CLARIN parliamentary corpus.</p>
<p>While the proposal tries to cater for many encoding needs, it is possible that new
users will have to use TEI elements or attributes that are not discussed in the prose
guidelines. Since the recommendations are still under development, the formal schema
Expand Down Expand Up @@ -324,20 +325,22 @@
<div xml:id="presentation">
<head>Presentation of Parla-CLARIN</head>
<p>Like the TEI Guidelines, the Parla-CLARIN recommendations are available on <ref
target="https://github.com/clarin-eric/parla-clarin/"><ptr type="software"
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub"
>GitHub</rs></ref>, as a project<note>Tomaž Erjavec and Andrej Pančur, Parla-CLARIN
project <ptr type="software" xml:id="GitHub" target="#GitHub"/><rs type="soft.name"
ref="#GitHub">GitHub</rs> site, last updated March 17, 2021, <ptr
target="https://github.com/clarin-eric/parla-clarin/"/>.</note> of the CLARIN ERIC
collection. The project contains a folder for the schema (i.e., the Parla-CLARIN ODD
document and XML schemas derived from it), a folder for the programs that convert the
ODD into the XML schemas and to the HTML of the prose and schema definitions, and a
folder for examples, which contains an artificial but fully worked out example of a
Parla-CLARIN document and subfolders with various example resources, where each should
contain: <list rend="ordered">
target="https://github.com/clarin-eric/parla-clarin/"><ptr type="software" xml:id="R1"
target="#GitHub"/><rs type="soft.name" ref="#R1">GitHub</rs></ref>, as a
project<note>Tomaž Erjavec and Andrej Pančur, Parla-CLARIN project <ptr
type="software" xml:id="R2" target="#GitHub"/><rs type="soft.name" ref="#R2"
>GitHub</rs> site, last updated March 17, 2021, <ptr type="software" xml:id="R9"
target="#parlaclarinscripts"/><rs type="soft.url" ref="#R9"><ptr
target="https://github.com/clarin-eric/parla-clarin/"/></rs>.</note> of the CLARIN
ERIC collection. The project contains a folder for the schema (i.e., the Parla-CLARIN
ODD document and XML schemas derived from it), a folder for the <rs type="soft.name"
ref="#R9">programs that convert the ODD into the XML schemas and to the HTML of the
prose and schema definitions</rs>, and a folder for examples, which contains an
artificial but fully worked out example of a Parla-CLARIN document and subfolders with
various example resources, where each should contain: <list rend="ordered">
<item>a sample of a corpus in its source encoding;</item>
<item>XSLT script to convert it into Parla-CLARIN; and</item>
<item><rs type="soft.name" ref="#R9">XSLT script to convert it into Parla-CLARIN</rs>;
and</item>
<item>the output of the conversion.</item>
</list>
</p>
Expand Down Expand Up @@ -495,12 +498,15 @@
<p>Nevertheless, AKN is an important schema for modeling parliamentary proceedings,
especially as the primary encoding standard used by various legislative bodies, so some
of AKN’s solutions were used in developing the Parla-CLARIN proposal, in particular the
typology of divisions of a document. Also developed was a partial, but non-trivial,
conversion from AKN to Parla-CLARIN, which covers several AKN example documents. As
mentioned in <ptr type="crossref" target="#presentation"/>, the example documents and
conversion script can be found in the <ident>Examples</ident> folder of the Parla-CLARIN
Git repository. The <ident>akn2tei.xsl</ident> script attempts to preserve the IDs of
the source AKN document, converts the AKN addressee, role, and questions and answers to
typology of divisions of a document. Also developed was a partial, but non-trivial, <ptr
type="software" xml:id="R10" target="#parlaclarinscripts"/><rs type="soft.name"
ref="#R10">conversion from AKN to Parla-CLARIN</rs>, which covers several AKN example
documents. As mentioned in <ptr type="crossref" target="#presentation"/>, the example
documents and conversion script can be found in the <ident>Examples</ident> folder of
the Parla-CLARIN Git repository. The <ptr type="software" xml:id="R11"
target="#parlaclarinscripts"/><rs type="soft.name" ref="#R11"
><ident>akn2tei.xsl</ident></rs> script attempts to preserve the IDs of the source
AKN document, converts the AKN addressee, role, and questions and answers to
Parla-CLARIN, and maps FRBR data (which distinguishes a <soCalled>work</soCalled> from
its <soCalled>expression</soCalled> and its expression from its
<soCalled>manifestation</soCalled>) to the appropriate TEI elements and attributes.
Expand Down Expand Up @@ -572,9 +578,10 @@
parliamentary proceedings meant for scholarly investigations. This scheme is currently a
straightforward customization of the TEI Guidelines, with the majority of the effort
having gone into the writing of the prose guidelines of the Parla-CLARIN recommendations
and into developing the conversion from Akoma Ntoso to Parla-CLARIN. We have not included
examples of the encoding, as these are readily available on the <ptr type="software"
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>
and into developing the <ptr type="software" xml:id="R12" target="#parlaclarinscripts"
/><rs type="soft.name" ref="#R12">conversion from Akoma Ntoso to Parla-CLARIN</rs>. We
have not included examples of the encoding, as these are readily available on the <ptr
type="software" xml:id="R3" target="#GitHub"/><rs type="soft.name" ref="#R3">GitHub</rs>
documentation page of the project, and large Parla-CLARIN encoded corpora are openly
available.</p>
<p>Apart from the siParl 2.0 corpus mentioned above (<ptr type="crossref"
Expand All @@ -601,15 +608,21 @@
<p>As we wanted to have corpora that are not only interchangeable but interoperable as well,
we created a bespoke ParlaMint XML schema directly in RelaxNG – the schema is compatible
with Parla-CLARIN as it validates a subset of documents that would be validated against
Parla-CLARIN. We produced common scripts that can convert any of the four corpora to plain
text, to CoNLL-U format as used by the Universal Dependencies project, and to vertical
format as used by the <ref target="http://cwb.sourceforge.net/">CWB</ref><note>The IMS
Open Corpus Workbench (CWB), last modified March 30, 2021, <ptr
target="http://cwb.sourceforge.net/"/>.</note> and <ref
target="http://www.sketchengine.eu/">Sketch Engine</ref><note>Accessed January 13, 2022,
<ptr target="http://www.sketchengine.eu/"/>.</note> (<ref type="bibl"
target="#kilgarriff14">Kilgarriff et al. 2014</ref>) concordancers, as well as to
extract complete speech metadata into TSV files.</p>
Parla-CLARIN. We produced <ptr type="software" xml:id="R13" target="#parlaclarinscripts"
/><rs type="soft.url" ref="#R13">common scripts that can convert any of the four corpora
to plain text, to CoNLL-U format as used by the Universal Dependencies project, and to
vertical format as used by the <ptr type="software" xml:id="R14" target="#cwb"/><rs
type="soft.url" ref="#R14"><ref target="http://cwb.sourceforge.net/"
>CWB</ref></rs></rs><note>The <rs type="soft.name" ref="#R14">IMS Open Corpus Workbench
(CWB)</rs>, last modified March 30, 2021, <rs type="soft.url" ref="#R14"><ptr
target="http://cwb.sourceforge.net/"/></rs>.</note> and <ptr type="software"
xml:id="R15" target="#sketchengine"/><rs type="soft.url" ref="#R15"><ref
target="http://www.sketchengine.eu/"><rs type="soft.name" ref="#R15">Sketch
Engine</rs></ref></rs><note>Accessed January 13, 2022, <rs type="soft.url"
ref="#R15"><ptr target="http://www.sketchengine.eu/"/></rs>.</note> (<rs
type="soft.bib.ref" ref="#R15"><ref type="bibl" target="#kilgarriff14">Kilgarriff et al.
2014</ref></rs>) concordancers, as well as to extract complete speech metadata into
TSV files.</p>
<p>In order for Parla-CLARIN to achieve its goal of becoming a widely recognized encoding
format for corpora of parliamentary proceedings, significant work remains to be done. On
the basis of the lessons learned in creating ParlaMint, we plan to revise the prose
Expand All @@ -619,10 +632,10 @@
specification from the default ones in the TEI Guidelines to ones taken or adapted from
the collected parliamentary corpora.</p>
<p>Second, as we have already done for ParlaMint, we plan to add to the <ptr type="software"
xml:id="GitHub" target="#GitHub"/><rs type="soft.name" ref="#GitHub">GitHub</rs>
Parla-CLARIN project more down-conversion scripts with which we would increase the
usability of the Parla-CLARIN corpora. As mentioned, work also needs to be done to develop
a conversion to RDF.</p>
xml:id="R4" target="#GitHub"/><rs type="soft.name" ref="#R4">GitHub</rs> Parla-CLARIN
project more down-conversion scripts with which we would increase the usability of the
Parla-CLARIN corpora. As mentioned, work also needs to be done to develop a conversion to
RDF.</p>
<p>Last, but not least, one of the great benefits of Git is the ability to support
collaborative work, be it through posting issues, or through using pull requests to
incorporate changes. While the community has not so far made use of these options, we hope
Expand Down Expand Up @@ -790,8 +803,8 @@
<bibl xml:id="kilgarriff14"><author>Kilgarriff, Adam</author>, <author>Vít Baisa</author>,
<author>Jan Bušta</author>, <author>Miloš Jakubíček</author>, <author>Vojtěch
Kovář</author>, <author>Jan Michelfeit</author>, <author>Pavel Rychlý</author>, and
<author>Vít Suchomel</author>. <date>2014</date>. <title level="a">The Sketch Engine:
Ten Years On.</title>
<author>Vít Suchomel</author>. <rs type="soft.bib.ref" ref="ewfew"><date>2014</date>.
<title level="a">The Sketch Engine: Ten Years On.</title></rs>
<title level="j">Lexicography: Journal of ASIALEX</title>
<biblScope unit="volume">1</biblScope> (<biblScope unit="issue">1</biblScope>):
<biblScope unit="page">7–36</biblScope>. doi:<idno type="DOI"
Expand Down
Loading

0 comments on commit a16bfc3

Please sign in to comment.