Saxy (Sá xị) is an XML SAX parser and encoder in Elixir that focuses on speed, usability and standard compliance.
Comply with Extensible Markup Language (XML) 1.0 (Fifth Edition).
- An incredibly fast XML 1.0 SAX parser.
- An extremely fast XML encoder.
- Native support for streaming parsing large XML files.
- Parse XML documents into simple DOM format.
- Support quick returning in event handlers.
Add :saxy
to your mix.exs
.
def deps do
[{:saxy, "~> 0.9.1"}]
end
Full documentation is available on HexDocs.
A SAX event handler implementation is required before starting parsing.
defmodule MyEventHandler do
@behaviour Saxy.Handler
def handle_event(:start_document, prolog, state) do
IO.inspect("Start parsing document")
{:ok, [{:start_document, prolog} | state]}
end
def handle_event(:end_document, _data, state) do
IO.inspect("Finish parsing document")
{:ok, [{:end_document} | state]}
end
def handle_event(:start_element, {name, attributes}, state) do
IO.inspect("Start parsing element #{name} with attributes #{inspect(attributes)}")
{:ok, [{:start_element, name, attributes} | state]}
end
def handle_event(:end_element, name, state) do
IO.inspect("Finish parsing element #{name}")
{:ok, [{:end_element, name} | state]}
end
def handle_event(:characters, chars, state) do
IO.inspect("Receive characters #{chars}")
{:ok, [{:chacters, chars} | state]}
end
end
Then start parsing XML documents with:
iex> xml = "<?xml version='1.0' ?><foo bar='value'></foo>"
iex> Saxy.parse_string(xml, MyEventHandler, [])
{:ok,
[{:end_document},
{:end_element, "foo"},
{:start_element, "foo", [{"bar", "value"}]},
{:start_document, [version: "1.0"]}]}
Saxy also accepts file stream as the input:
stream = File.stream!("/path/to/file")
Saxy.parse_stream(stream, MyEventHandler, initial_state)
It even supports parsing a normal stream.
stream = File.stream!("/path/to/file") |> Stream.filter(&(&1 != "\n"))
Saxy.parse_stream(stream, MyEventHandler, initial_state)
Saxy can parse part of an XML document, and parse more of it later.
alias Saxy.Parser.Partial
xml = """
<?xml version=1.0' ?>
<foo bar=value'>
</foo>
"""
split_xml = String.split(xml, "\n")
{:ok, context} = Partial.init(MyEventHandler, initial_state)
{:ok, context} = Partial.parse(Enum.at(split_xml, 0), context)
{:ok, context} = Partial.parse(Enum.at(split_xml, 1), context)
{:ok, context} = Partial.parse(Enum.at(split_xml, 2), context)
{:ok, state} = Partial.finish(context)
Sometimes it will be convenient to just export the XML document into simple DOM format, which is a 3-element tuple including the tag name, attributes, and a list of its children.
Saxy.SimpleForm
module has this nicely supported:
Saxy.SimpleForm.parse_string(data)
{"menu", [],
[
{"movie",
[{"id", "tt0120338"}, {"url", "https://www.imdb.com/title/tt0120338/"}],
[{"name", [], ["Titanic"]}, {"characters", [], ["Jack & Rose"]}]},
{"movie",
[{"id", "tt0109830"}, {"url", "https://www.imdb.com/title/tt0109830/"}],
[
{"name", [], ["Forest Gump"]},
{"characters", [], ["Forest & Jenny"]}
]}
]}
Saxy supports exporting to xmerl format, which you could then use for xmerl_xpath or SweetXML.
Note that xmerl format requires tag and attribute names to be atoms. By default Saxy
uses String.to_existing_atom/1
to avoid runtime atom creation. You could override
this behaviour by specifying :atom_fun
option to String.to_atom/1
.
iex> string = File.read!("/path/to/my.xml")
iex> Saxy.Xmerl.parse_string(string, atom_fun: &String.to_atom/1)
{:ok,
{:xmlElement,
:foo,
:foo,
[],
{:xmlNamespace, [], []},
[],
1,
[{:xmlAttribute, :bar, :bar, [], [], [], 1, [], 'value', :undefined}],
[],
[],
[],
:undeclared}}
Saxy offers two APIs to build simple form and encode XML document.
Use Saxy.XML
to build and compose XML simple form, then Saxy.encode!/2
to encode the built element into XML binary.
iex> import Saxy.XML
iex> element = element("person", [gender: "female"], "Alice")
{"person", [{"gender", "female"}], [{:characters, "Alice"}]}
iex> Saxy.encode!(element, [])
"<?xml version=\"1.0\"?><person gender=\"female\">Alice</person>"
See Saxy.XML
for more XML building APIs.
Saxy also provides Saxy.Builder
protocol to help composing structs into simple form.
defmodule Person do
@derive {Saxy.Builder, name: "person", attributes: [:gender], children: [:name]}
defstruct [:gender, :name]
end
iex> jack = %Person{gender: :male, name: "Jack"}
iex> john = %Person{gender: :male, name: "John"}
iex> import Saxy.XML
iex> root = element("people", [], [jack, john])
iex> Saxy.encode!(root, [])
"<?xml version=\"1.0\"?><people><person gender=\"male\">Jack</person><person gender=\"male\">John</person></people>"
Benchmarking in XML is hard and highly depends on the complexity of the document. Saxy usually yields 1.4 times better than Erlsom in benchmark results. With deeply nested documents, it is particularly noticeably faster with 4.35 times faster.
As for XML builder, Saxy is usually 4 times faster than xml_builder on simple element encoding, and 17 times faster in deeply nested elements encoding.
The benchmark suite can be found in this repository.
- No XSD supported.
- No DTD supported, when the parser encounters a
<!DOCTYPE
, it simply stops parsing.
☝️ Sa Xi, pronounced like sa-see
, is an awesome soft drink made by Chuong Duong.
If you have any issues or ideas, feel free to write to https://github.com/qcam/saxy/issues.
To start developing:
- Fork the repository.
- Write your code and related tests.
- Create a pull request at https://github.com/qcam/saxy/pulls.