It would be easiest to start by making simple TagSoup script that generates Haskell data structure representing the data type, and provide parser/pretty-printer for this structure.
It would be best to use these as reference:
- Great introduction to XML Schema.
- Zvon reference of XML Schema
- Examples of multi-stage programming:
- ["Gentle Introduction to Multi-Stage Programming](https://www.cs.tufts.edu/~nr/cs257/archive/walid-taha/dspg04a.pdf" by Walid Tahas)
- code generation for types is used in
json-autotype:Data/Aeson/AutoType/CodeGen/*.hs
.
- Example schemas with nice explanations:
- Online XML Schema validators:
Type names in XML Schema refer to either xsd:complexType
or xsd:simpleType
.
These are either anonymous, if they only occur once, or named, and then they can be re-used by giving
name anywhere in the same schema. Thus empty xsd:complexType
or xsd:simpleType
is usually reference.
The entire XML Schema
will be mapping from type names to individual types:
type XMLSchema = Map String XMLSchemaType
Larger schemas usually assign types by reference:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
<xsd:element name="person" type="personType" />
<xsd:complexType mixed="false" name="personType">
<xsd:sequence>
<xsd:element name="name" type="xsd:string" minOccurs="1" maxOccurs="1"/>
<xsd:element name="address" type="xsd:string" minOccurs="1" maxOccurs="2"/>
<xsd:element name="friend" type="personType" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Example document:
<person>
<name>Michal</name>
<address>Singapore</address>
<friend>
<name>Kevin</name>
<address>Canada</address>
</friend>
<friend>
<name>Vaibhav</name>
<address>Singapore</address>
</friend>
</person>
Here both and have the same type. Defined only once.
The first confusing thing in XML is a distinction between:
mixedType=True
- which is free form mixture of text and elements),mixedType=false
- which works like a record.
Example translation (without closing brackets, but with indents instead):
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="person">
<xsd:complexType mixed="false">
<xsd:sequence>
<xsd:element name="name" type="xsd:string" />
<xsd:element name="age" type="xsd:positiveInteger" />
<xsd:element name="birthplace">
<xsd:complexType mixed="false">
<xsd:sequence>
<xsd:element name="city" type="xsd:string" />
<xsd:element name="country" type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Should be translated to:
data Birthplace = Birthplace { city :: String, country :: String }
data Person = Person { name :: String, age :: Int, birthplace :: Birthplace }
But with mixed content:
<xsd:element name="p">
<xsd:complexType mixed="true">
<xsd:choice>
<xsd:element name="em" xsd:type="p" />
<xsd:element name="strong" xsd:type="p" />
</xsd:choice>
</xsd:complexType>
</xsd:element>
With example document like:
<p>Alphabetic <em>or</em> possibly <strong>phonetic</strong> representation.</p>
Should be translated into:
data P = P [PElt]
data PElt = Em EmT
| Strong StrongT
| Text
document = P [Text "Alphabetic", Em [Text "or"], Text " possibly ", Strong [Text "phonetic"], Text " representation"]
-- In this case:
-- type EmT = [PElt]
-- type StrongT = [PElt]
Remember: either complexType
, and simpleType
may be named for future reference in the same document.
Attributes are always attached to their elements.
Attributes should thus be treated as a flat field in the complexType record,
but the attribute type can only be xsd:simpleType
.
data SimpleType = TString | TInteger | ...
A lot of objects can have unique name that can be referenced for sharing afterwards. These should ideally be expressed either as same type for each occurence, or as some kind of type class (for attribute groups) This applies to:
xsd:element
,xsd:simpleType
,xsd:complexType
,xsd:attribute
,xsd:attributeGroup
.
- xsd:schema xsd:targetNamespace="..." gives target namespace for objects.
- this namespace is normally labelled with
xmlns:mynamespace="..."
later - unimportant for us
There are many more features in XML Schema, that is of secondary importance, or can be ignored.
For example xsd:restriction
of existing type - that restricts range of allowed values.
Can be solved by simply assertion on printing the result.
xsd:sequence
- record by element name
xsd:all
- record (the same as xsd:sequence
), but the order of fields may vary (important for decoding only)
xsd:simpleType
- flat type (Int, String etc.)
xsd:complexType
- record type
mixed content - a list of either:
- any element type
- text node
xsd:restriction
- restricts type (can be implemented as assertion on output)