Skip to content

Latest commit

 

History

History
178 lines (150 loc) · 6.13 KB

plan.md

File metadata and controls

178 lines (150 loc) · 6.13 KB

Plan of what to do

It would be easiest to start by making simple TagSoup script that generates Haskell data structure representing the data type, and provide parser/pretty-printer for this structure.

For further reference

It would be best to use these as reference:

Basic concepts

Type names

Type names in XML Schema refer to either xsd:complexType or xsd:simpleType. These are either anonymous, if they only occur once, or named, and then they can be re-used by giving name anywhere in the same schema. Thus empty xsd:complexType or xsd:simpleType is usually reference.

The entire XML Schema will be mapping from type names to individual types:

type XMLSchema = Map String XMLSchemaType

Larger schemas usually assign types by reference:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > 

  <xsd:element name="person" type="personType" /> 
  <xsd:complexType mixed="false" name="personType"> 
    <xsd:sequence> 
      <xsd:element name="name" type="xsd:string" minOccurs="1" maxOccurs="1"/>
      <xsd:element name="address" type="xsd:string" minOccurs="1" maxOccurs="2"/>
      <xsd:element name="friend" type="personType" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence> 
  </xsd:complexType> 
</xsd:schema>

Example document:

<person>
  <name>Michal</name>
  <address>Singapore</address>
  <friend>
    <name>Kevin</name>
    <address>Canada</address>
  </friend>
  <friend>
    <name>Vaibhav</name>
    <address>Singapore</address>
  </friend>
</person>

Here both and have the same type. Defined only once.

Mixed types and normal types

The first confusing thing in XML is a distinction between:

  • mixedType=True - which is free form mixture of text and elements),
  • mixedType=false - which works like a record.

Example translation (without closing brackets, but with indents instead):

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="person">
    <xsd:complexType mixed="false">
      <xsd:sequence>
        <xsd:element name="name" type="xsd:string" />
        <xsd:element name="age"  type="xsd:positiveInteger" />
        <xsd:element name="birthplace">
          <xsd:complexType mixed="false">
            <xsd:sequence>
              <xsd:element name="city"  type="xsd:string" />
              <xsd:element name="country"  type="xsd:string" />
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

</xsd:schema>

Should be translated to:

data Birthplace = Birthplace { city :: String, country :: String }

data Person = Person { name :: String, age :: Int, birthplace :: Birthplace }

But with mixed content:

<xsd:element name="p">
  <xsd:complexType mixed="true">
    <xsd:choice>
      <xsd:element name="em"     xsd:type="p" />
      <xsd:element name="strong" xsd:type="p" />
    </xsd:choice>
  </xsd:complexType>
</xsd:element>

With example document like:

<p>Alphabetic <em>or</em> possibly <strong>phonetic</strong> representation.</p>

Should be translated into:

data P = P [PElt]

data PElt = Em     EmT
          | Strong StrongT
          | Text

document = P [Text "Alphabetic", Em [Text "or"], Text " possibly ", Strong [Text "phonetic"], Text " representation"]
-- In this case:
-- type EmT     = [PElt]
-- type StrongT = [PElt]

Remember: either complexType, and simpleType may be named for future reference in the same document.

Elements versus attributes

Attributes are always attached to their elements. Attributes should thus be treated as a flat field in the complexType record, but the attribute type can only be xsd:simpleType.

data SimpleType = TString | TInteger | ...

Dictionaries

A lot of objects can have unique name that can be referenced for sharing afterwards. These should ideally be expressed either as same type for each occurence, or as some kind of type class (for attribute groups) This applies to:

  • xsd:element,
  • xsd:simpleType,
  • xsd:complexType,
  • xsd:attribute,
  • xsd:attributeGroup.

Namespaces

  • xsd:schema xsd:targetNamespace="..." gives target namespace for objects.
  • this namespace is normally labelled with xmlns:mynamespace="..." later - unimportant for us

A lot of features

There are many more features in XML Schema, that is of secondary importance, or can be ignored. For example xsd:restriction of existing type - that restricts range of allowed values. Can be solved by simply assertion on printing the result.

Glossary

xsd:sequence - record by element name xsd:all - record (the same as xsd:sequence), but the order of fields may vary (important for decoding only) xsd:simpleType - flat type (Int, String etc.) xsd:complexType - record type mixed content - a list of either:

  • any element type
  • text node xsd:restriction - restricts type (can be implemented as assertion on output)