XML format

Apart from StanfordCoreNLP, Jigg's XML encodes several tag-specific information as attributes. For example, the following <token> in StanfordCoreNLP

<token id="1">
  <word>Stanford</word>
  <lemma>Stanford</lemma>
  <CharacterOffsetBegin>0</CharacterOffsetBegin>
  <CharacterOffsetEnd>8</CharacterOffsetEnd>
</token>

are represented in Jigg as

<token id="s0_1" form="Stanford" lemma="Stanford" CharacterOffsetBegin="0" CharacterOffsetEnd="8"/>

The main characteristics in Jigg are:

Each element (e.g., token) has a unique id (e.g, s0_1) in the XML. In StanfordCoreNLP, these ids are not unique.
Some information (e.g., surface form) is represented as a different field (e.g., form rather than word).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XML format

Clone this wiki locally