Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VocabString cannot parse third-party CVs #145

Closed
bworrell opened this issue Jun 13, 2014 · 4 comments · Fixed by #168
Closed

VocabString cannot parse third-party CVs #145

bworrell opened this issue Jun 13, 2014 · 4 comments · Fixed by #168
Assignees
Milestone

Comments

@bworrell
Copy link
Contributor

When attempting to parse an instance document that utilizes a third party CV, an exception is raised. Here is some test output:


>>> s  = STIXPackage.from_xml("TESTVOCAB.xml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/c/dev/python-stix/stix/core/stix_package.py", line 429, in from_xml
    return parser.parse_xml(xml_file)
  File "/c/dev/python-stix/stix/utils/parser.py", line 109, in parse_xml
    stix_package = STIXPackage().from_obj(stix_package_obj)
  File "/c/dev/python-stix/stix/core/stix_package.py", line 379, in from_obj
    return_obj.stix_header = STIXHeader.from_obj(obj.get_STIX_Header())
  File "/c/dev/python-stix/stix/core/stix_header.py", line 112, in from_obj
    return_obj.package_intents = [VocabString.from_obj(x) for x in obj.get_Package_Intent()]
  File "/c/dev/python-stix/stix/common/vocabs.py", line 110, in from_obj
    klass = VocabString.lookup_class(vocab_obj.xsi_type)
  File "/c/dev/python-stix/stix/common/vocabs.py", line 70, in lookup_class
    raise ValueError("Unregistered xsi:type %s" % xsi_type)
ValueError: Unregistered xsi:type testVocabs:TestVocab-1.0

This issue was raised during the STIX Community call on 2014/06/12.

Related to #97

@bworrell bworrell self-assigned this Jun 13, 2014
@bworrell
Copy link
Contributor Author

My testing, proprietary Controlled Vocabulary schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:testVocabs="http://testvocabs.com/vocabs-1" xmlns:stixCommon="http://stix.mitre.org/common-1" targetNamespace="http://testvocabs.com/vocabs-1" elementFormDefault="qualified" version="1.1.1" xml:lang="English">
    <xs:import namespace="http://stix.mitre.org/common-1" schemaLocation="stix_common.xsd"/>
    <xs:complexType name="TestVocab-1.0">
        <xs:simpleContent>
            <xs:restriction base="stixCommon:ControlledVocabularyStringType">
                <xs:simpleType>
                    <xs:union memberTypes="testVocabs:TestEnum-1.0"/>
                </xs:simpleType>
                <xs:attribute name="vocab_name" type="xs:string" use="optional" fixed="Test Vocab"/>
                <xs:attribute name="vocab_reference" type="xs:anyURI" use="optional" fixed="http://example.com/TestVocab"/>
            </xs:restriction>
        </xs:simpleContent>
    </xs:complexType>
    <xs:simpleType name="TestEnum-1.0">
        <xs:restriction base="xs:string">
            <xs:enumeration value="TEST">
                <xs:annotation>
                    <xs:documentation>Testing</xs:documentation>
                </xs:annotation>
            </xs:enumeration>
        </xs:restriction>
    </xs:simpleType>
</xs:schema>

My test instance document ("TESTVOCAB.xml"). I override the default Package_Intent CV to use my testVocabs:TestVocab-1.0 CV:

<stix:STIX_Package
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:stix="http://stix.mitre.org/stix-1"
    xmlns:testVocabs="http://testvocabs.com/vocabs-1"
    xmlns:example="http://example.com/"
    xsi:schemaLocation="
    http://stix.mitre.org/stix-1 ../stix_core.xsd
    http://testvocabs.com/vocabs-1 ../my_vocabs.xsd"
    id="example:STIXPackage-33fe3b22-0201-47cf-85d0-97c02164528d"
    timestamp="2014-05-08T09:00:00.000000Z"
    version="1.1.1">
    <stix:STIX_Header>
        <stix:Package_Intent xsi:type="testVocabs:TestVocab-1.0">TEST</stix:Package_Intent>
    </stix:STIX_Header>
</stix:STIX_Package>

The python code to parse this requires developers to create their own VocabString derivation and register it as a known Controlled Vocabulary, using stix.common.vocabs.add_vocab():

from stix.core import STIXPackage
from stix.common.vocabs import VocabString, add_vocab

class TestVocab(VocabString):
    _namespace = 'http://testvocabs.com/vocabs-1'
    _XSI_TYPE = 'testVocabs:TestVocab-1.0'

add_vocab(TestVocab)

s = STIXPackage.from_xml("TESTVOCAB.xml")
print s.stix_header.package_intents[0].xsi_type # prints testVocabs:TestVocab-1.0

This parses the document just fine, and prints the xsi_type class property correctly. However, executing s.to_xml() raises a KeyError because the http://testvocabs.com/vocabs-1 namespace isn't registered as a known namespace to export.

I am going to address the serialization bug and change the behavior of VocabString.lookup_class to return a VocabString instance if the xsi:type lookup fails.

@bworrell
Copy link
Contributor Author

The following commits have addressed a couple of issues:

  • 5a79974 : Parsing no longer throws an exception when an unknown CV is encountered. A generic VocabString class is returned from lookup_class(). With this setup, xsi:type information is stripped, though maybe we don't want this behavior?
  • 564e8f3 : A lot of work done to nsparser. This commit allows users to define a VocabString derivation under a non-STIX namespace and export it without the namespace parser throwing up the KeyError exception mentioned above.

To be done:

  • Persist schemalocations on input so non-STIX-namespaced CVs can validate properly on round-trip
  • Consider requiring CVs to define an _XSI_TYPE if the _namespace attribute is not the default STIX vocab namespace.
  • ?? Probably something else I am forgetting :)

@bworrell
Copy link
Contributor Author

  • 5604e39 : Non-default schemaLocations are included during export call.

Still need to figure out:

  • Allow users to declare schemaLocations either as a class propety, like _XSI_TYPE or _namespace or as a parameter to to_xml()

@bworrell
Copy link
Contributor Author

  • 3da966c : to_xml() now accepts a schemaloc_dict parameter which allows for third-party schemalocations to be passed in. It's possible to overwrite existing STIX default schemalocations with this parameter, which may or may not be a desired behavior.

Closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant