-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Size Limit for .graphml import? #447
Comments
Hi! I tried this out and did indeed run into issues with a larger file, although I could see an OOM in my logs (have you checked the logs? perhaps you also have this?), the fix for me was to adjust the batchSize in the config.
I am unsure on the optimum number here, but the default is 20,000 so I imagine 100 was a bit extreme in lowness 😅 I'll also ticket this to see if we can make either performance improvements or at least throw an exception instead of crashing the query! Let me know if this helps :) |
Thank you for the 'batch size' tip, it will be useful b/c the next batch of files will be larger. Note, my situation is slightly different as there is no error, it's just that half the edges are ignored/not imported. For example, the graphml contains 4M nodes, 4M edges but after an error free upload the neo4j db shows 4M nodes and 2M edges. I looked to see if the graphml had duplicates of edges but that is not the case. |
The logs are in debug.log :) The file is imported in batches of transactions, so if all the edges are last in the file, then it potentially crashes before it hits them, but the transaction has already committed the nodes, which might explain the discrepancy. |
Yeah found them =D This is the log from executing the import, with no batch, into an empty db. Nothing indicates an error to me or what am I missing?
|
Hmm okay, how does the query log look for it? Also did it work with trying the batchSize? I can't reproduce a case where it just misses the relationships 🙈 |
Thank you for the replies! Yes, the process works when using batchSize but I get the same result i.e. half the edges but no error. Note that I have confirmed the |
The "RELATED" type is added as every relationship must have one type, and if none is specified APOC adds that generic one. The reason why those 2 queries return double the amount is because they are returning 2 of every relationship. Matching on a path with no direction will return (a)-->(b) as well as (b)<--(a). If you only want one of each you need to add a direction :) |
Expected Behavior
I have been using the following command to import
.graphml
files in to neo4jThis has worked in the past with
.graphml
files up to 1 GB in size.Actual Behavior
I've recently had to work with larger
.graphml
files. An import of a file that was 3GB in size proceeded without error except that while all the nodes were imported only half the edges were. No error or warning was thrown.Note that the
.graphml
is an xml document that begins with meta info, followed by node info, followed by edge info. Since the import stops midway through the edge info I'm wondering if there is a setting/limit on number of lines or size of the.graphml
file?How to Reproduce the Problem
.graphml
file.graphml
file in to Neo4j using command above.Versions
The text was updated successfully, but these errors were encountered: