[bug] `CrossSentenceMention` needs start and end character offsets #2

myedibleenso · 2024-05-13T21:26:57Z

Character start and end offsets for CrossSentenceMentions can't be retrieved in a straightforward manner from the Odin Mention JSON we receive from clulab/processors:

clu-processors/python/lum/clu/odin/serialization.py

Line 103 in 11970d5

start = mjson["characterStartOffset"]

This is because a cross sentence mention involved multiple token intervals.

Proposed solution

Safely retrieve start and end via mjson.get("characterStartOffset", None) and mjson.get("characterEndOffset", None)
Construct CrossSentenceMention's

clu-processors/python/lum/clu/odin/serialization.py

Lines 215 to 216 in 11970d5

start=start,

end=end,

using min and max token intervals for anchor and neighbor
Add unit tests (see this example as a reference)

The text was updated successfully, but these errors were encountered:

myedibleenso added the bug Something isn't working label May 13, 2024

myedibleenso changed the title ~~[bug] CrossSentenceMention needs start and end character offsets~~ [bug] CrossSentenceMention needs start and end character offsets May 13, 2024

myedibleenso assigned vincentraymond-ua May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] `CrossSentenceMention` needs start and end character offsets #2

[bug] `CrossSentenceMention` needs start and end character offsets #2

myedibleenso commented May 13, 2024 •

edited

Loading

[bug] CrossSentenceMention needs start and end character offsets #2

[bug] CrossSentenceMention needs start and end character offsets #2

Comments

myedibleenso commented May 13, 2024 • edited Loading

Proposed solution

[bug] `CrossSentenceMention` needs start and end character offsets #2

[bug] `CrossSentenceMention` needs start and end character offsets #2

myedibleenso commented May 13, 2024 •

edited

Loading