Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xeno.DOM: Heap exhausted on a 5.6M file #65

Open
unhammer opened this issue May 22, 2023 · 4 comments
Open

Xeno.DOM: Heap exhausted on a 5.6M file #65

unhammer opened this issue May 22, 2023 · 4 comments
Labels

Comments

@unhammer
Copy link
Contributor

unhammer commented May 22, 2023

longlines.xml.zip
↑ through xeno-dom exhaust heap memory. I just put the file into the list in SpeedBigFiles.hs as
[ benchFile ["xeno-dom"] "6MB" "longlines.xml.bz2"
and got

benchmarking 6M/xeno-dom
xeno-speed-big-files-bench: Heap exhausted;
xeno-speed-big-files-bench: Current maximum heap size is 26843545600 bytes (25600 MB).

Strangely, only minor changes to the file (e.g. sed 's/x/xx/gincreasing the file size) will let it through with about 800M maxresident (as reported by /usr/bin/time). Inserting newlines after each > we also get 800M maxresident, but it doesn't seem to be related to the long lines, as almost any change to the file helps.

(Yes I should be using Xeno.SAX, but why does e.g. https://dumps.wikimedia.org/nowiki/20230520/nowiki-20230520-pages-articles-multistream-index.txt.bz2 at 11M go through fine with <400M maxresident and this one not? Even removing newlines, the wiki works fine. This feels like leakage.)

@ocramz
Copy link
Owner

ocramz commented Jun 20, 2023

@unhammer perhaps you could try this test with the latest master ? see #63

unhammer added a commit to unhammer/xeno that referenced this issue Jun 21, 2023
ocramz#65

stack --stack-yaml stack-lts-18.yaml build && stack --stack-yaml stack-lts-18.yaml bench
@unhammer
Copy link
Contributor Author

The issue remains :(

@ocramz
Copy link
Owner

ocramz commented Jun 21, 2023

"fy fan". Ok this requires some deeper thinking.

@ocramz
Copy link
Owner

ocramz commented Jun 21, 2023

@unhammer anyway, it's at least reassuring that the latest patch doesn't change the memory behavior of the library (kudos @mitchellwrosen )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants