Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xsd 1.1 assert not validated by xerces 2.12.1-xml-schema-1.1 #395

Open
SmartLayer opened this issue Oct 22, 2020 · 12 comments
Open

xsd 1.1 assert not validated by xerces 2.12.1-xml-schema-1.1 #395

SmartLayer opened this issue Oct 22, 2020 · 12 comments
Assignees

Comments

@SmartLayer
Copy link

SmartLayer commented Oct 22, 2020

You will not be able to reproduce this because I commended out the offending line in tokenscript.xsd

  1. Download xerces 2.12.1-xml-schema-1.1
$ wget -O - https://archive.apache.org/dist/xerces/j/binaries/Xerces-J-bin.2.12.1-xml-schema-1.1.tar.gz|tar -zxvf -
  1. Run the validator
$ java -classpath xerces-2_12_1-xml-schema-1.1/xercesImpl.jar:xerces-2_12_1-xml-schema-1.1/xercesSamples.jar:xerces-2_12_1-xml-schema-1.1/xml-apis.jar sax.Counter -s COFI.xml
[Error] tokenscript.xsd:55:78: s4s-elt-invalid-content.1: The content of '#AnonType_token' is invalid. Element 'assert' is invalid, misplaced, or occurs too often.
COFI.xml: 1616 ms (52 elems, 53 attrs, 0 spaces, 22526 chars)

To reproduce this problem, uncomment the two lines mentioned in #388 and edit the test xml file (in this case COFI but any tokenscript file will do) to use the edited tokenscript.xsd then you can see this problem.

Note that I am already using the version of xerces that supports xml-schema 1.1

SmartLayer pushed a commit that referenced this issue Oct 22, 2020
as we couldn't get the validator to work with the new <assert> statement
documented here
#395
@darakhbharat
Copy link
Contributor

darakhbharat commented Oct 26, 2020

Hi Weiwu,

I have created xerces based utility to validate the XML using the XSD file and below are the details.

Command:
$ java -classpath "xercesImpl.jar;xercesSamples.jar;xml-apis.jar;xpath2-1.2.0.jar;XMLValidator.jar" XMLValidator H:/alphawallet/TokenScript/schema/tokenscript.xsd H:/alphawallet/tokenscripts/COFI.xml

Note: You need to replace ; with : (for unix) while adding JAR files in classpath.

All the required JAR files are attached here.
xerces-2_12_1-xml-schema-1.1.zip

Arguments:
1. First argument is the absolute path to xsd file.
i.e. H:/alphawallet/TokenScript/schema/tokenscript.xsd
2. Second argument is the absolute path to XML file that needs to be validated against the XSD mentioned in argument 1
i.e. H:/alphawallet/tokenscripts/COFI.xml

Now you can validate XML against the XSD 1.1 using this package.

@darakhbharat
Copy link
Contributor

darakhbharat commented Oct 26, 2020

Tracking of requirement details from mail conversation:

It seems that XML Schema 1.1 is only supported by either Xerces 2.12
(the version with XML schema 1.1 support) or with Saxon. Saxon's
opensource version, at a glance, only support XSLT and XQuery, since
there is no mentioning of validation in the manual.

Once you have the validator, we will need a Pull-Request that not only
returns the xmlschema 1.1 rules that I commented out (2 lines), but
also change the schema's root element according to this article:

https://www.oxygenxml.com/doc/versions/22.1/ug-editor/topics/set-xml-schema-version.html

Otherwise, some tools will still process it with schema 1.0.

Let me know how you progressed on this! Thanks.

@darakhbharat
Copy link
Contributor

#395 (comment)

Whatever I have here is initial version, we can eventually convert this to your suggested approach in the requirement document.

I can improve the Java Code to take default XSD path from the github and then we just need to pass the xml name to the command. We can also give the option to refer the local XSD schema for the validation.

My idea is to write a shell script where we can pass the action(validate, sign, c14n, verify) as command line argument and then based on action appropriate Java class will be invoked.

@SmartLayer
Copy link
Author

SmartLayer commented Oct 26, 2020

My idea is to write a shell script where we can pass the action(validate, sign, c14n, verify)
as command line argument and then based on action appropriate Java class will be invoked.

If you do so you will have to produce 2 versions (.sh and .bat) and they may behave a bit different depending on MacOS/Ubuntu. It's no harm if the content is extremely simple, you just need to keep it minimal and test it on all OSes, but in this case it's expected to be complicated - i.e. the fact that you can concatenate sub-commands means it's not going to be simple at all, and whatever shell script you write will have to manage a lot of intermediary files. See the example of "Multi-command processing" below:


Let's say xmlsec.jar for now, has 4 sub commands.

$ java -jar xmlsec.jar val tokenscript.xml
$ java -jar xmlsec.jar  sign [-o tokenscript-signed.xml | -d output.dir/] tokenscript.xml
$ java -jar xmlsec.jar c14n [-o tokenscript-signed.xml | -d output.dir/] tokenscript.xml
$ java -jar xmlsec.jar verify tokenscript.xml

The first and last commands also have a long form (validate and canonic, respectively). The second and the third command has an output. If unspecified, it will simply be tokenscript-signed.xml (that is, take the input file name, remove the extension and add -signed.xml, following the convention set by Android apk files).

Each sub-commands has their own parameters

For example, sign has --key

Multi file processing

It should be possible to process multiple files in all of the commands. For example:

$ java -jar xmlsec.jar val */*.xml

Which validates every XML files under every directory.

For the commands that has an output, either -o or -d should be used. But if there are multiple input file, then only -d is allowed. -d causes the output of the same filename under the directory specified.

Multi-command processing

It should be possible to concatenate commands. The most typical use-cases are:

$ java -jar xmlsec.jar val c14n sign verify tokenscript1.xml tokenscript.2xml

This causes the tokenscript files to be validated, canonicalized, signed and verified, and outputs tokenscript1-signed.xml and tokenscript2-signed.xml. (the verify subcommand is smart enough to know that the output file should be used to verify not the original input file). If one of the sub-command fails, the next sub-command is not executed; but if an input file caused one of the sub-command to fail, the next file in queue is processed.


If you simply don't like the java --jar syntax, then it's a different matter.

@SmartLayer
Copy link
Author

This errror seem to be in the schema. Can you make a PR and link back to this issue?

$ LANG=en_US java -classpath XMLValidator.jar:xpath2-1.2.0.jar:xercesImpl.jar:xercesSamples.jar:xml-apis.jar XMLValidator schema/tokenscript.xsd ../token-api-poc/tokenscripts/COFI.xml
COFI.xml is not valid because 
cvc-identity-constraint.4.3: Key 'typeRef' with value 'Transfer' not found for identity constraint of element 'token'.

@darakhbharat
Copy link
Contributor

darakhbharat commented Oct 26, 2020 via email

@SmartLayer
Copy link
Author

SmartLayer commented Oct 26, 2020

Schema is expecting below XML block in the XML file. Do you mean that I should fix the schema and make type attribute optional?

Then the validator is working correctly except the error reported isn't human readable!

$ LANG=en_US java -classpath XMLValidator.jar:xpath2-1.2.0.jar:xercesImpl.jar:xercesSamples.jar:xml-apis.jar XMLValidator schema/tokenscript.xsd ../token-api-poc/tokenscripts/COFI.xml
COFI.xml is valid.

@SmartLayer
Copy link
Author

SmartLayer commented Oct 26, 2020

Keeping it open when there is a tool so the xsd 1.1 stuff can be uncommented as the documents on how to validate it gets updated.

  • The code should be written with Java 11 as it is the default platform of†

The approach I would take is:

  1. Fork xmlsectool 3.0.0† with git clone https://git.shibboleth.net/git/xmlsectool
  2. Add two sub-commands: val and c14n support‡
  3. Change commandline syntax from --sign and --verify to just sign and verify
  4. Add multi-file processing
  5. Add multi-command processing

† 3.0.0 is an in-development version expected to come out in 2021 but 2.0.0 the current stable has very old libraries and has bugs with some of our processes. As a result of this approach, the code should be written with Java 11 as it is the default platform of xmlsectool Please try to use the latest Java API as backward compatibility is not desired.

‡ The current xmlsectool supports validation already, but it is not using Xerces with Schema 1.1 support (verified). Xerces seem to be the only one that can validate files that has entity references, which we need.

It is desirable to keep the possibility to sync up with future releases of xmlsectool, so you might choose to add instead of replace (e.g. add a subcommand to validation with Xerces instead of replacing what was there), and use sub-classing instead of changing much of the source code.

@darakhbharat
Copy link
Contributor

darakhbharat commented Oct 31, 2020

Further communication Updates from Telegram:

Weiwu: Stay connected you need to prioritise making the commandline tool that supports only validate (using the schema location in the xml header only - i have a reason for that) and canonicalisation, and support multi file processing and multi command processing. You should not proritise xml signing and verification as I can get by with sectool for the next a few weeks.

Why we need cannibalization? just want to know little bit more details about cannibalization in our existing stuff.

we actually don't need that, just entity dereference. So anything that can correctly read a XML file with entity reference in it and is able to serialise it into a single XML file will do the job for now.

@darakhbharat
Copy link
Contributor

xmlsectool vs Core Java xerces based validator:

Looks like the main focus of the xmlsectool is signing of the XML document. I also do not find the xmlsectool documentation clear. There is very little information available. If you have found different detailed official documentation than mentioned below Please direct me there.

https://wiki.shibboleth.net/confluence/display/XSTJ2/xmlsectool+V2+Home

https://wiki.shibboleth.net/confluence/display/CONCEPT/MetadataCorrectness#MetadataCorrectness-SchemaValidation.5

I do not see any special advantage of using xmlsectool for schema validation and entity de-referencing, So I am in favour of writing our own simple tool using xerces JAR.

@darakhbharat
Copy link
Contributor

Hi Weiwu,

I have completed the multi-file validation and attached is the Java Code.
Can you create separate repository where I can commit the code. If I have created my private repository but can not add collaborator as I do not have enterprise git subscription.

XMLValidator.zip

I will start on entity de-referencing. Where we will store the de-referenced file? OR do we need to override the same XML? For now I can create the new XML file to save the result of de-referenced action.

@darakhbharat
Copy link
Contributor

Hi Weiwu,

I am committing my changes in forker repository - https://github.com/darakhbharat/TokenScript.git.
Created new directory named xml-validation-against-xsd-1.1 to commit the changes.

Overview:

  • Completed the basic working features for xml validation against xsd-1.1
  • Completed basic working Entity de-referencing feature.
  • Current implementation Supports multi command and multi-file validation.

Here is the command:

$ java -classpath "xercesImpl.jar;xercesSamples.jar;xml-apis.jar;xpath2-1.2.0.jar;XMLValidator.jar" XMLValidator -val -deref H:/alphawallet/TokenScript/schema/tokenscript.xsd H:/alphawallet/tokenscripts/COFI.xml

  • The command requires minimum three arguments,
    1. the action: -val, -deref
    2. XSD file location
    3. the XML file location or the directory. If dir is provided then all the XML file present there will be validated.

Things needed to be improved:

  • Later I will also remove the dependency of XSD file location and it will pick the default if not provided.
  • Well formatted output
  • remove xml:base attribute from the de-referenced result.
  • Where to store the de-referenced result? Right now I am creating separate file by appending -deref to the original xml file name.

SmartLayer pushed a commit that referenced this issue Jun 30, 2023
as we couldn't get the validator to work with the new <assert> statement
documented here
#395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants