Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xpath experiment #1

Draft
wants to merge 47 commits into
base: master
Choose a base branch
from
Draft

Xpath experiment #1

wants to merge 47 commits into from

Commits on Apr 30, 2022

  1. Configuration menu
    Copy the full SHA
    ad5db2c View commit details
    Browse the repository at this point in the history

Commits on May 2, 2022

  1. Experiment with alternatives to Xmerl

    There is a lot of stuff in here. A good place to start is with the
    findings.md doc where we attempt to capture everything we've been up to.
    Some high level TL;DRs...:
    
    * High memory seems to come from how xmerl represents the XML, namely
      parents, position and the inclusion of smaller fields that we probably
      just dont need, nsinfo, expanded name... etc.
    
    We attempt an integration with DataSchema, there are a few ways that can
    work:
    
    1. Define a Saxy data accessor, this would result in Saxy.parse_string
       being called once per field in the schema, but it ignores everything
       except the one path it is looking for. It also returns as soon as we
       know we got what we needed. Preliminary results suggest it might be a
       bit slower but uses like half the memory. What makes this very tricky
       is figuring out what to do when we hit an has_many. This really feels
       solveable but the best I have ATM is a hacky solution - still
       incomplete too.
    2. We define our own "reducer" ie to_struct fn that that takes the
       schema and the xml and perhaps handles has_many differently. This is
       yet unexplored. It certainly feels less clean but if it works who
       cares
    3. Alter the representation of the schema - possibly to be keyd by the
       xpath, then as we progress through the XML we detect when we have
       reached a field we care about (based on the schema) and we save it if
       we have. This feels promising because we parse through the doc once
       but 1. representation of schemas is different and 2. it's a bti
       tricky to implement.
    4. We should think about it from scratch a bit, rather than trying to
       fit it into established paradigms, what's the simplest way to get
       what we want? (Might be one of the solutions proposed but we should
       think about it.)
    
    We also attempt to keep the current system but instead of creating
    erlang records and xmerl, creating a map of the XML - removing the
    unnecessary things like "parents" etc. Preliminary results show that
    this clearly reduces memeory a lot. We now need to figure out what the
    data we serialise to should look like AND we need to figure out an xpath
    replacement / integration. What's nice about this approach is that it
    still works with DataSchema.
    Adzz committed May 2, 2022
    Configuration menu
    Copy the full SHA
    ad5bca4 View commit details
    Browse the repository at this point in the history
  2. Adds a dynamic map approach

    This uses the Saxy handlers to create a map where the keys are dynamic.
    The theory is that this would make the querying faster, we can see that
    this approach is still significantly less memory than the current xmerl
    approach, but it does fare worse than our other "slimmed down map"
    approach. And it still feels like far too much mems. The next approaches
    are to:
    
    * Use a tuple instead of a dynamic map.
    * try the "slimmed down map" with a struct.
    * ...
    
    We also need to bench the querying, which is easy to do the simple case,
    but do we want to support `//` etc..?
    Adzz committed May 2, 2022
    Configuration menu
    Copy the full SHA
    ad56e20 View commit details
    Browse the repository at this point in the history
  3. Adds findings for dymanix tuple, tries with atoms as keys.

    Also ensures we return any children in the correct order.
    Adzz committed May 2, 2022
    Configuration menu
    Copy the full SHA
    ad53846 View commit details
    Browse the repository at this point in the history
  4. This adds a working sweetXML alternative for querying into the

    dynamic map tuple thing. There are some bits to implement like list of
    and probably but loads of edge cases and improvements. But this should
    let us benchmark the steamed ham examples Vs SweetXML.
    Adzz committed May 2, 2022
    Configuration menu
    Copy the full SHA
    ad546c4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ad559e7 View commit details
    Browse the repository at this point in the history
  6. Adds findings for the query approach.

    This is banging if it holds up!! 5 times less mems and quicker.
    
    Really need to try with a larger input now. That's gonna require
    porting large schema to the new one though....
    Adzz committed May 2, 2022
    Configuration menu
    Copy the full SHA
    ad521ca View commit details
    Browse the repository at this point in the history
  7. bench

    Adzz committed May 2, 2022
    Configuration menu
    Copy the full SHA
    ad5a1b8 View commit details
    Browse the repository at this point in the history

Commits on May 3, 2022

  1. start air shop change

    Adzz committed May 3, 2022
    Configuration menu
    Copy the full SHA
    ad5c252 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ad53dbe View commit details
    Browse the repository at this point in the history

Commits on May 4, 2022

  1. stashwip

    Adzz committed May 4, 2022
    Configuration menu
    Copy the full SHA
    ad527ff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ad59af3 View commit details
    Browse the repository at this point in the history
  3. fix attrs on has_one schema

    Adzz committed May 4, 2022
    Configuration menu
    Copy the full SHA
    ad592f9 View commit details
    Browse the repository at this point in the history

Commits on May 5, 2022

  1. Configuration menu
    Copy the full SHA
    ad5ca71 View commit details
    Browse the repository at this point in the history
  2. Checkpoint save

    Day474, captain's log. we are close there are noises outside.
    
    We are about to change where we pop off the stack from inside characters to inside the end element, this is because if a tag doesn't have characters in it we would never pop off the stack!
    Adzz committed May 5, 2022
    Configuration menu
    Copy the full SHA
    ad52488 View commit details
    Browse the repository at this point in the history
  3. checkpoint Its so almost working, some weirdness with has many dupes …

    …and stuff which smells like not putting it in the parent correctly
    Adzz committed May 5, 2022
    Configuration menu
    Copy the full SHA
    ad555d9 View commit details
    Browse the repository at this point in the history
  4. Checkpoint:

    one correct has many, one to go
    
    We are about to experiment with changing how a path should be
    structured.... We are moving to putting the last node in the has_many
    xpath to be NOT Salads but Salad. I think this might even match xpath...
    Adzz committed May 5, 2022
    Configuration menu
    Copy the full SHA
    ad5554c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ad5dc89 View commit details
    Browse the repository at this point in the history
  6. slight clean up

    Adzz committed May 5, 2022
    Configuration menu
    Copy the full SHA
    ad59613 View commit details
    Browse the repository at this point in the history
  7. OMG IT WORKS

    Adzz committed May 5, 2022
    Configuration menu
    Copy the full SHA
    ad5a86e View commit details
    Browse the repository at this point in the history
  8. remove comments

    Adzz committed May 5, 2022
    Configuration menu
    Copy the full SHA
    ad5db9c View commit details
    Browse the repository at this point in the history

Commits on May 6, 2022

  1. Adds some results

    These are some iterations on the "straight to struct" approach - mainly
    experimenting with changing the accumulator in some way.
    Adzz committed May 6, 2022
    Configuration menu
    Copy the full SHA
    ad58d63 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ad5568f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ad5178c View commit details
    Browse the repository at this point in the history

Commits on May 8, 2022

  1. Configuration menu
    Copy the full SHA
    ad51733 View commit details
    Browse the repository at this point in the history
  2. remove inspects etc

    Adzz committed May 8, 2022
    Configuration menu
    Copy the full SHA
    ad55659 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ad520a4 View commit details
    Browse the repository at this point in the history
  4. fix handler

    Adzz committed May 8, 2022
    Configuration menu
    Copy the full SHA
    ad56bfe View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ad5ebd1 View commit details
    Browse the repository at this point in the history
  6. progressing the air shop schema

    Adzz committed May 8, 2022
    Configuration menu
    Copy the full SHA
    ad5d134 View commit details
    Browse the repository at this point in the history
  7. lay ground for sweet air shop

    Adzz committed May 8, 2022
    Configuration menu
    Copy the full SHA
    ad598b6 View commit details
    Browse the repository at this point in the history

Commits on May 9, 2022

  1. air shop schemas

    Adzz committed May 9, 2022
    Configuration menu
    Copy the full SHA
    ad5ebc3 View commit details
    Browse the repository at this point in the history
  2. Add Sweet XML schemas for air shop

    ALRIGHT this should let us benchamrk performance of querying!
    
    What we should have done is Runtime schemas then had one struct used for
    both, then we could compare the two to_struct results for equality. In
    fact we still could, especially if we wrote a function that turned the
    compile time schema into a runtime one. Whell guess we have that already
    nearly with __data_schema_fields.
    Adzz committed May 9, 2022
    Configuration menu
    Copy the full SHA
    ad5ebd1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ad5328c View commit details
    Browse the repository at this point in the history

Commits on May 10, 2022

  1. Adds v4 test

    Adzz committed May 10, 2022
    Configuration menu
    Copy the full SHA
    ad5b5f3 View commit details
    Browse the repository at this point in the history
  2. try v4 schema

    Adzz committed May 10, 2022
    Configuration menu
    Copy the full SHA
    ad50c8f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ad5183a View commit details
    Browse the repository at this point in the history

Commits on May 11, 2022

  1. confused stash to not lose work

    Adzz committed May 11, 2022
    Configuration menu
    Copy the full SHA
    ad5e464 View commit details
    Browse the repository at this point in the history

Commits on May 13, 2022

  1. Configuration menu
    Copy the full SHA
    ad5dc4e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ad57fc6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ad5aeb3 View commit details
    Browse the repository at this point in the history
  4. add some notes

    Adzz committed May 13, 2022
    Configuration menu
    Copy the full SHA
    ad536bb View commit details
    Browse the repository at this point in the history

Commits on May 17, 2022

  1. Adds wip bunch of stuff. I think it's clear the complexity is too lar…

    …ge here. But we did get a working version of straight to struct
    Adzz committed May 17, 2022
    Configuration menu
    Copy the full SHA
    ad5a6d0 View commit details
    Browse the repository at this point in the history

Commits on May 18, 2022

  1. fix some warnings

    Adzz committed May 18, 2022
    Configuration menu
    Copy the full SHA
    ad56b1c View commit details
    Browse the repository at this point in the history

Commits on May 20, 2022

  1. more benches

    Adzz committed May 20, 2022
    Configuration menu
    Copy the full SHA
    ad5b5a0 View commit details
    Browse the repository at this point in the history

Commits on May 27, 2022

  1. messing with bench more

    Adzz committed May 27, 2022
    Configuration menu
    Copy the full SHA
    ad50a34 View commit details
    Browse the repository at this point in the history

Commits on Jun 15, 2022

  1. slimmed simple form

    Adzz committed Jun 15, 2022
    Configuration menu
    Copy the full SHA
    ad566ae View commit details
    Browse the repository at this point in the history