Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load.xml can't load entity reference. #340

Closed
vunhatchuong opened this issue Mar 3, 2023 · 6 comments
Closed

load.xml can't load entity reference. #340

vunhatchuong opened this issue Mar 3, 2023 · 6 comments
Labels
documentation Improvements or additions to documentation

Comments

@vunhatchuong
Copy link

Expected Behavior

File loaded and render &Ouml as Ö

Actual Behavior

Error: Neo.ClientError.Procedure.ProcedureCallFailed

Failed to invoke procedure `apoc.load.xml`: Caused by: org.xml.sax.SAXParseException; lineNumber: 89; columnNumber: 24; The entity "Ouml" was referenced, but not declared.

How to Reproduce the Problem

Simple Dataset

<article mdate="2019-10-25" key="tr/gte/TR-0146-06-91-165" publtype="informal">
<author>Alejandro P. Buchmann</author>
<author>M. Tamer &Ouml;zsu</author>
<author>Dimitrios Georgakopoulos</author>
<title>Towards a Transaction Management System for DOM.</title>
<journal>GTE Laboratories Incorporated</journal>
<volume>TR-0146-06-91-165</volume>
<month>June</month>
<year>1991</year>
<url>db/journals/gtelab/index.html#TR-0146-06-91-165</url>
</article>
CALL apoc.load.xml("file://dblp.xml") yield value return value

Steps

  1. Remove DOCTYPE in dblp.xml file since load.xml can't handle it.
  2. Try to load dblp.xml with apoc.load.xml.
  3. Error thrown.

Versions

  • OS: Endeavor OS
  • Neo4j: 5.5.0
  • Neo4j-Apoc: 5.5.0
@Lojjs
Copy link
Contributor

Lojjs commented Mar 13, 2023

@vunhatchuong123 Thanks for reporting. We will investigate and come back to you.

@Lojjs
Copy link
Contributor

Lojjs commented Mar 20, 2023

@vunhatchuong123 First a caveat; I'm part of the team working with APOC but I have not very much experience with XML in particular. I wonder if this is really a valid XML file. I tried to upload your XML data to two different online XML formatters, https://www.freeformatter.com/xml-formatter.html and https://jsonformatter.org/xml-formatter, to see how they behave compared to APOC. The first one errors with the similar error Unable to parse any XML input. Error on line 3: The entity "Ouml" was referenced, but not declared.. The second one does accept &Ouml but render it as is rather than format it into an Ö. Do you have earlier experience where XML handling works as you expect it?

Best regards Louise Söderström

Ps. I do see the usefulness of your request, myself having 2 Ö in my name. ;)

@vunhatchuong
Copy link
Author

I stopped using neo4j right now so I won't be able to help that much but I'll try my best.

I think the problem comes from APOC not able to process dtd type definition files, specifically in this case it's dtd entity definitions. This dataset comes from DBLP, and in it there's a dblp.dtd file.

Here's a preview of that file:

<!ENTITY Ouml    "&#214;" ><!-- capital O, dieresis or umlaut mark -->
<!ENTITY Oslash  "&#216;" ><!-- capital O, slash -->
<!ENTITY Ugrave  "&#217;" ><!-- capital U, grave accent -->
<!ENTITY Uacute  "&#218;" ><!-- capital U, acute accent -->

So because APOC fails to load dblp.dtd, it doesn't understand &Ouml.

@Lojjs
Copy link
Contributor

Lojjs commented Mar 20, 2023

Thanks for coming back with more information. We have made a conscious decision not to support DTD files of security reasons. I will see if we can improve our documentation around this so it is more clear it is not supported.

@gem-neo4j gem-neo4j added the documentation Improvements or additions to documentation label Apr 17, 2023
@hvub
Copy link
Collaborator

hvub commented Dec 18, 2024

It seems DBLP has moved to not relying on dtd entity definitions but directly using &#214; to represent the Ö etc.
The DBLP snippet referred to above,
https://dblp.org/rec/tr/gte/TR-0146-06-91-165.xml
loads nicely in its current form. Maybe this is a sign that dtd entity definitions get a bit out of fashion.

Anyway, I have opened a PR to add a note to the documentation.

@gem-neo4j
Copy link
Contributor

Docs have been merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants