Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for XML transformations #137

Open
alza-bitz opened this issue Oct 15, 2018 · 5 comments
Open

Support for XML transformations #137

alza-bitz opened this issue Oct 15, 2018 · 5 comments
Labels

Comments

@alza-bitz
Copy link
Contributor

Hi there,

I have a small project to transform one xml format into another (call them fmt-a and fmt-b). I thought Clojure Spec might be useful to define the shape of the data at each stage of the transformation and check the transformation works correctly for expected inputs.

Then I found spec-tools and it's transformers in this blog post:

https://www.metosin.fi/blog/spec-transformers/

So I thought I'd investigate that. I noted that although it supports JSON it doesn't seem to support XML yet (although XML is mentioned in the blog post?)

Anyway this was going to be my approach with spec-tools:

  1. parse fmt-a-xml from xml file
  2. transform fmt-a-xml-data (i.e. parsed {:tag :attributes :content structure}) to fmt-a (i.e. {:fmt-a-key some-val})
  3. transform fmt-a to fmt-b (i.e. {:fmt-b-key some-val})
  4. transform fmt-b to fmt-b-xml-data (i.e. {:tag :attributes :content structure})
  5. format fmt-b-xml-data to xml file

Does this sound sensible? I'm not sure how to achieve all of the above with spec-tools, but I was going to start experimenting with step 3, the core data transformation.

One issue I can see before I proceed, is that "invalid" inputs would raise an error at step 2 or 3, so the errors wouldn't make a lot of sense in relation to the xml input file.

Note: there's no xml schema available for fmt-a or fmt-b, if that matters.

Thanks!

@ikitommi
Copy link
Member

ikitommi commented Oct 15, 2018

You could use spec-tools for the XML->EDN transformation but I think you still need an XML->EDN->XML converter. Last time we needed this ended up writing a small utility lib for this.

<products><product><id>1</id></product><product><id>2</id></product></products>

would be converted into something like this:

{:products [{:id 1} {:id 2}]}

... and back.

works nicely for a large subset of XML. The EDN format could have spec and spec-tools could encode & decode the types.

PS. oh, one version of the xml-helper is here. Would need some love, last commit 4years ago and haven't most likely used since :O

@alza-bitz
Copy link
Contributor Author

Hi there,

Thanks! I tried the xml-helper link (noxml) but I got a 404..?

Also, re: "I think you still need an XML->EDN->XML converter", could you expand on that? I was hoping/assuming that spec-tools could provide all of

XML(A)->EDN(A)
EDN(A)->EDN(B)
EDN(B)->XML(B)

With an understanding that some extra work would be required for the XML transformations?

@ikitommi
Copy link
Member

ikitommi commented Oct 16, 2018

spec-tools is not an XML-parser. The default XML-parsers return the XML in the verbose map format with :tag, :attrs and :content. You could write a spec transformer for that, but could be a lot of work(?). But, if you can convert that into a JSON/EDN-like map with tag-names as keys, you are mostly there and the spec transformation works just like with JSON / Strings.

The linked lib seems internal, not fully tested.

Here's a full round-robin with JSON. The JSON->EDN is done by Muuntaja:

(require '[clojure.spec.alpha :as s])
(require '[spec-tools.core :as st])
(require '[muuntaja.core :as m])

(s/def ::name string?)
(s/def ::birthdate inst?)
(s/def ::age int?)

(s/def ::languages
  (s/coll-of
    (s/and keyword? #{:clj :cljs})
    :into #{}))

(s/def ::user
  (s/keys
    :req-un [::name ::languages ::age]
    :opt-un [::birthdate]))

(defn encode-json [x] (slurp (m/encode m/instance "application/json" x)))
(defn decode-json [x] (m/decode m/instance "application/json" x))

(def ilona {:birthdate #inst "1968-01-02T15:04:05Z"
            :age 48
            :name "Ilona"
            :languages #{:clj :cljs}})

(as-> ilona $
      (doto $ prn)
      (encode-json $)
      (doto $ prn)
      (decode-json $)
      (doto $ prn)
      (st/decode ::user $ st/json-transformer)
      (do (assert (= $ ilona)) $)
      (doto $ prn)
      (encode-json $)
      (doto $ prn)
      (assert (= (encode-json ilona) $)))
; {:birthdate #inst "1968-01-02T15:04:05.000-00:00", :age 48, :name "Ilona", :languages #{:clj :cljs}}
; "{\"birthdate\":\"1968-01-02T15:04:05Z\",\"age\":48,\"name\":\"Ilona\",\"languages\":[\"clj\",\"cljs\"]}"
; {:birthdate "1968-01-02T15:04:05Z", :age 48, :name "Ilona", :languages ["clj" "cljs"]}
; {:birthdate #inst "1968-01-02T15:04:05.000-00:00", :age 48, :name "Ilona", :languages #{:clj :cljs}}
; "{\"birthdate\":\"1968-01-02T15:04:05Z\",\"age\":48,\"name\":\"Ilona\",\"languages\":[\"clj\",\"cljs\"]}"

@alza-bitz
Copy link
Contributor Author

alza-bitz commented Oct 31, 2018

Hi there,

Thanks for the advice! I've been experimenting with transforming the verbose :tag, :attrs and :content structure output by the parser (which I'll call widget-xml) into something more manageable (which I'll call widget) for further transformations downstream etc.

So, I have a data spec like this:

(def widget
{::some-widget-property number?})

(def widget-spec
  (->
   (std/spec
    {:name ::widget
     :spec widget})
   (assoc 
          :encode/xml widget->xml
          :decode/xml xml->widget)))

And an "xml transformer" like this:

(def xml-transformer
  (st/type-transformer
   {:name :xml
    :decoders stt/string-type-decoders
    :default-encoder stt/any->any}))

Then I can use the following code:

(st/decode widget-spec widget-xml xml-transformer)

To transform

{:tag :WIDGET :content ["123"]}

Into

{::some-widget-property 123}

So, my xml->widget decode function transforms the verbose structure, and spec-tools is handling the coercion of the leaf types. Great!

However, I then wanted to define a spec fdef for the xml->widget decode function (imagine it's a bit more complex than the above example, where there are nested structures and therefore several nested specs, and several nested functions for transforming the structure). So I define another spec for the decode function input :args (call it widget-xml-spec, but I can't use the already existing widget-spec for the decode function output :ret, because that assumes coercion of the types.. So currently I have to make a additional spec for the decode function output, call it widget-str-spec which is equivalent to widget-spec except for the coercion parts. For the above example, the data spec would be:

(def widget-str
{::some-widget-property string?})

(def widget-str-spec
   (std/spec
    {:name ::widget-str
     :spec widget-str}))

So if there are many functions making up the decoding, for which it would be nice to have a spec fdef on each, there's three specs to be made for each (xxx-xml-spec, xxx-str-spec, xxx-spec), where xxx-str-spec is just duplicating xxx-spec defined for the coercion output, but is just replacing coercible types with string? etc.

I tried defining :ret as:

#(s/valid? ::widget-spec (st/coerce ::widget-spec % xml-transform))

Which works ok of course when :ret is valid, but when it isn't, the output from spec explaining the problem hides the useful details, since the spec is wrapped in another predicate.

The only other idea I thought about to avoid manually creating the extra xxx-str-spec for each fdef I write, is to programmatically make that spec from xxx-spec. But I wasn't sure how to go about that, and it might be a bit fiddly/a lot of work.

So, in conclusion I'm unsure about how to proceed (apart from manually creating the xxx-str specs for each fdef), which doesn't feel right..

@alza-bitz
Copy link
Contributor Author

Just following this up, I found a solution to my problem..

I decided to define the widget-xml-spec with coerced types instead of strings, for example

(def widget-xml
{:tag :WIDGET :content [number?]})

(def widget-xml-spec
   (std/spec
    {:name ::widget-xml
     :spec widget-xml})))

Then I can use the following code:

(st/decode widget-spec widget-xml xml-transformer)

To transform

{:tag :WIDGET :content ["123"]}

Into

{::some-widget-property 123}

As in, spec-tools is doing the coercion of "123" to 123. This solution means I only need to define two specs instead of three, and I can now define a spec fdef for the xml->widget function like this:

(s/fdef xml->widget 
  :args (s/cat widget-xml-spec) 
  :ret widget-spec)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants