-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xpath experiment #1
base: master
Are you sure you want to change the base?
Commits on Apr 30, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5db2c - Browse repository at this point
Copy the full SHA ad5db2cView commit details
Commits on May 2, 2022
-
Experiment with alternatives to Xmerl
There is a lot of stuff in here. A good place to start is with the findings.md doc where we attempt to capture everything we've been up to. Some high level TL;DRs...: * High memory seems to come from how xmerl represents the XML, namely parents, position and the inclusion of smaller fields that we probably just dont need, nsinfo, expanded name... etc. We attempt an integration with DataSchema, there are a few ways that can work: 1. Define a Saxy data accessor, this would result in Saxy.parse_string being called once per field in the schema, but it ignores everything except the one path it is looking for. It also returns as soon as we know we got what we needed. Preliminary results suggest it might be a bit slower but uses like half the memory. What makes this very tricky is figuring out what to do when we hit an has_many. This really feels solveable but the best I have ATM is a hacky solution - still incomplete too. 2. We define our own "reducer" ie to_struct fn that that takes the schema and the xml and perhaps handles has_many differently. This is yet unexplored. It certainly feels less clean but if it works who cares 3. Alter the representation of the schema - possibly to be keyd by the xpath, then as we progress through the XML we detect when we have reached a field we care about (based on the schema) and we save it if we have. This feels promising because we parse through the doc once but 1. representation of schemas is different and 2. it's a bti tricky to implement. 4. We should think about it from scratch a bit, rather than trying to fit it into established paradigms, what's the simplest way to get what we want? (Might be one of the solutions proposed but we should think about it.) We also attempt to keep the current system but instead of creating erlang records and xmerl, creating a map of the XML - removing the unnecessary things like "parents" etc. Preliminary results show that this clearly reduces memeory a lot. We now need to figure out what the data we serialise to should look like AND we need to figure out an xpath replacement / integration. What's nice about this approach is that it still works with DataSchema.
Configuration menu - View commit details
-
Copy full SHA for ad5bca4 - Browse repository at this point
Copy the full SHA ad5bca4View commit details -
This uses the Saxy handlers to create a map where the keys are dynamic. The theory is that this would make the querying faster, we can see that this approach is still significantly less memory than the current xmerl approach, but it does fare worse than our other "slimmed down map" approach. And it still feels like far too much mems. The next approaches are to: * Use a tuple instead of a dynamic map. * try the "slimmed down map" with a struct. * ... We also need to bench the querying, which is easy to do the simple case, but do we want to support `//` etc..?
Configuration menu - View commit details
-
Copy full SHA for ad56e20 - Browse repository at this point
Copy the full SHA ad56e20View commit details -
Adds findings for dymanix tuple, tries with atoms as keys.
Also ensures we return any children in the correct order.
Configuration menu - View commit details
-
Copy full SHA for ad53846 - Browse repository at this point
Copy the full SHA ad53846View commit details -
This adds a working sweetXML alternative for querying into the
dynamic map tuple thing. There are some bits to implement like list of and probably but loads of edge cases and improvements. But this should let us benchmark the steamed ham examples Vs SweetXML.
Configuration menu - View commit details
-
Copy full SHA for ad546c4 - Browse repository at this point
Copy the full SHA ad546c4View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad559e7 - Browse repository at this point
Copy the full SHA ad559e7View commit details -
Adds findings for the query approach.
This is banging if it holds up!! 5 times less mems and quicker. Really need to try with a larger input now. That's gonna require porting large schema to the new one though....
Configuration menu - View commit details
-
Copy full SHA for ad521ca - Browse repository at this point
Copy the full SHA ad521caView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5a1b8 - Browse repository at this point
Copy the full SHA ad5a1b8View commit details
Commits on May 3, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5c252 - Browse repository at this point
Copy the full SHA ad5c252View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad53dbe - Browse repository at this point
Copy the full SHA ad53dbeView commit details
Commits on May 4, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad527ff - Browse repository at this point
Copy the full SHA ad527ffView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad59af3 - Browse repository at this point
Copy the full SHA ad59af3View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad592f9 - Browse repository at this point
Copy the full SHA ad592f9View commit details
Commits on May 5, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5ca71 - Browse repository at this point
Copy the full SHA ad5ca71View commit details -
Day474, captain's log. we are close there are noises outside. We are about to change where we pop off the stack from inside characters to inside the end element, this is because if a tag doesn't have characters in it we would never pop off the stack!
Configuration menu - View commit details
-
Copy full SHA for ad52488 - Browse repository at this point
Copy the full SHA ad52488View commit details -
checkpoint Its so almost working, some weirdness with has many dupes …
…and stuff which smells like not putting it in the parent correctly
Configuration menu - View commit details
-
Copy full SHA for ad555d9 - Browse repository at this point
Copy the full SHA ad555d9View commit details -
one correct has many, one to go We are about to experiment with changing how a path should be structured.... We are moving to putting the last node in the has_many xpath to be NOT Salads but Salad. I think this might even match xpath...
Configuration menu - View commit details
-
Copy full SHA for ad5554c - Browse repository at this point
Copy the full SHA ad5554cView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5dc89 - Browse repository at this point
Copy the full SHA ad5dc89View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad59613 - Browse repository at this point
Copy the full SHA ad59613View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5a86e - Browse repository at this point
Copy the full SHA ad5a86eView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5db9c - Browse repository at this point
Copy the full SHA ad5db9cView commit details
Commits on May 6, 2022
-
These are some iterations on the "straight to struct" approach - mainly experimenting with changing the accumulator in some way.
Configuration menu - View commit details
-
Copy full SHA for ad58d63 - Browse repository at this point
Copy the full SHA ad58d63View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5568f - Browse repository at this point
Copy the full SHA ad5568fView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5178c - Browse repository at this point
Copy the full SHA ad5178cView commit details
Commits on May 8, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad51733 - Browse repository at this point
Copy the full SHA ad51733View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad55659 - Browse repository at this point
Copy the full SHA ad55659View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad520a4 - Browse repository at this point
Copy the full SHA ad520a4View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad56bfe - Browse repository at this point
Copy the full SHA ad56bfeView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5ebd1 - Browse repository at this point
Copy the full SHA ad5ebd1View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5d134 - Browse repository at this point
Copy the full SHA ad5d134View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad598b6 - Browse repository at this point
Copy the full SHA ad598b6View commit details
Commits on May 9, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5ebc3 - Browse repository at this point
Copy the full SHA ad5ebc3View commit details -
Add Sweet XML schemas for air shop
ALRIGHT this should let us benchamrk performance of querying! What we should have done is Runtime schemas then had one struct used for both, then we could compare the two to_struct results for equality. In fact we still could, especially if we wrote a function that turned the compile time schema into a runtime one. Whell guess we have that already nearly with __data_schema_fields.
Configuration menu - View commit details
-
Copy full SHA for ad5ebd1 - Browse repository at this point
Copy the full SHA ad5ebd1View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5328c - Browse repository at this point
Copy the full SHA ad5328cView commit details
Commits on May 10, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5b5f3 - Browse repository at this point
Copy the full SHA ad5b5f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad50c8f - Browse repository at this point
Copy the full SHA ad50c8fView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5183a - Browse repository at this point
Copy the full SHA ad5183aView commit details
Commits on May 11, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5e464 - Browse repository at this point
Copy the full SHA ad5e464View commit details
Commits on May 13, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5dc4e - Browse repository at this point
Copy the full SHA ad5dc4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad57fc6 - Browse repository at this point
Copy the full SHA ad57fc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad5aeb3 - Browse repository at this point
Copy the full SHA ad5aeb3View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad536bb - Browse repository at this point
Copy the full SHA ad536bbView commit details
Commits on May 17, 2022
-
Adds wip bunch of stuff. I think it's clear the complexity is too lar…
…ge here. But we did get a working version of straight to struct
Configuration menu - View commit details
-
Copy full SHA for ad5a6d0 - Browse repository at this point
Copy the full SHA ad5a6d0View commit details
Commits on May 18, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad56b1c - Browse repository at this point
Copy the full SHA ad56b1cView commit details
Commits on May 20, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad5b5a0 - Browse repository at this point
Copy the full SHA ad5b5a0View commit details
Commits on May 27, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad50a34 - Browse repository at this point
Copy the full SHA ad50a34View commit details
Commits on Jun 15, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad566ae - Browse repository at this point
Copy the full SHA ad566aeView commit details