Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turtle lang server does not allow uchar in iriref #42

Open
GordianDziwis opened this issue Feb 25, 2021 · 2 comments
Open

Turtle lang server does not allow uchar in iriref #42

GordianDziwis opened this issue Feb 25, 2021 · 2 comments

Comments

@GordianDziwis
Copy link

This should be valid turtle

<s> <o> <http://www.example.org/\u0020bar> .
@jmrog
Copy link
Contributor

jmrog commented Feb 25, 2021

@BonaBeavis I believe the text you provided is not valid Turtle. IRIREF entities cannot have \u0020 (the space character), per the IRIREF grammar rule (note that, for an IRIREF, the parser is supposed to receive the string you provided with the escape sequences unescaped, not still escaped, so it is seeing a space character here (i.e., the unescaped value of \u0020), which is not allowed). You can also double-check this with other validators to confirm that the text you provided is not valid Turtle.

The Turtle parser appears to be working correctly here and with other unicode sequences. For example, if you change your provided text to the following

<s> <o> <http://www.example.org/\u0021bar> .

the parser parses the text correctly.

@GordianDziwis
Copy link
Author

I still do not get it, because the specs define a Turtle document as:

"A conforming Turtle document is a Unicode string that conforms to the grammar and additional constraints defined in section 6. Turtle Grammar, starting with the turtleDoc production. A Turtle document serializes an RDF Graph."

And the document does confirm to the turtle grammar: yacker: turtleEsc validation results.

But as you said, the parser should unescape the IRIREF and that would produce a invalid IRI.

I can think of three possible resolutions:

  1. The rule for uchar is simplified and omits the forbidden unicode escapes.
  2. Unescaped is just a typo and escaped was the intention.
  3. I overlooked some "additional constraints" defined in the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants