-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems on 15TB ZFS flatfs repo: mutating MFS dir with 25k entries, ipfs add hang #10588
Comments
GitHub said "File size too big: 25 MB are allowed, 50 MB were attempted to upload." So here's "ipfs-profile-2024-11-17-utc.zip": Update edit: This really is worse than expected. Add files command hanging on a small file (and sometimes oom-killing things) is one thing, but missing files is way worse. I see that the ipfs datastore indexes or whatever cannot find the following, meaning that it's completely gone or the index got messed up:
I had 100% of that folder in the past: Timeline: Point is that no filesystem or data corruption or data screw up happened other than ipfs's. That bafybeif...qm2m folder = 321 blocks (32,722,333 bytes). It corresponds to these .data files: I didn't run any gc command, and as far as I know, no gc was ran in the background. No point in running mirror HDDs if the data gets deleted by some software (RAID is not a backup). This is really annoying or frustrating. |
Note: It is not that the 244 out of 322 blocks were deleted. Adding files to MFS does not necessarily download all the data, but rather creates references to the data. Did you explicitly preload the data after It'd be helpful to get profiles during the bad events such as:
|
I think your leveldb datastore blew up. Does leveldb is used to store the pinset and the mfs root, probably it is hanging there. That doesn't explain your "data missing" part, as the blocks are stored in flatfs ( |
You are referring to the 32MB folder I wrote about: probably didn't use MFS at all with that one (if I did I only copied its root to MFS after
It's basically hanging again. Command
The "Sl" status reported by |
Roughly 24 or 48 hours after About pinning being broken - in the past I was able to do this: run That go.dev tool looks helpful. I wondered about a thing to do that in the past! Anyways, I already did that: converting the CID's data into the corresponding CIQ*.data files. I said that in an above post which contains the text "It corresponds to these .data files:". How I did it: (1.) saw that I didn't have all of a CID in "repo A" where I did in the past (2.) downloaded a .car file of that CID from an HTTP-only website (3.) deleted everything in "repo B" = empty repo (4.) imported said CAR file into repo B (5.) got a list of all .data files in repo B (6.) checked repo A to see which ones were missing = only had 78 out of 322 of them. (Repo A is in one computer and repo B is in a different computer.) |
Timeline: What that looks like - Terminal tab 2:
Terminal tab 1:
I then ran |
A quick view on your stack:
Try disabling the reprovider setting the Interval to 0 (https://github.com/ipfs/kubo/blob/master/docs/config.md#reproviderinterval) and enable StrategicProviding (https://github.com/ipfs/kubo/blob/master/docs/experimental-features.md#strategic-providing) as this disables bitswap-providing additionally and see if anything improves. Otherwise, backup the |
Above, I said "Like a week ago I saw some error about a dirty flag not being cleared." This refers to this error, also had it today: Also above, I described that Other problem: pinning appears to be broken. Less-than-ideal solution: rollback to a previous ZFS snapshot (I made none so can't do this). If I could rollback, then that might also contain .data files that were mysteriously deleted.
By "pinset of certain size" I think you mean the total data / file size. I have 13,643 pins as reported by
Will do, thanks for suggesting things that may help.
I'm kinda out of storage space. Need ~$400 for another 18TB HDD (price+tax): maybe month(s) later. Edit: read next post. |
You posted "Otherwise, backup the leveldb altogether and/or backup the list of pins. Then delete it and then try again and tell me if it keeps hanging. I'm still fairly convinced leveldb is the bottleneck." I may have misinterpreted this as meaning "delete everything (after copying it elsewhere) so you can redo it". I think you actually meant that I should duplicate/backup then delete "$IPFS_PATH/datastore/*" then see if it works better. (datastore folder = like the same thing as leveldb as far as I know.) For above changes ( #10588 (comment) ): |
Yes sorry, I meant that folder only. It does not contain the blocks, it only contains metadata (pinset, mfs root, reprovider queues). I'm trying to establish if the hanging behavior you see is related to leveldb.
|
Days ago I did some tests with
but I think I'm going to have to ignore the results I got with that because it wrote the following to the config:
which is fine, but
may be wrong. The string for "0" may not be the same as the JSON number type for 0, which is Here's the results anyways, which should maybe be ignored: the same days ago I tried adding many files to MFS. Both attempts resulted in OOM. The out of memory either resulted in the entire computer freezing or the daemon getting killed. Also around that time, I did try to get a profile by running |
This is fine because this gets parsed as a duration (i.e "0s" or "10m") so it should be a string. Did you try with an empty leveldb datastore? And does it still hang when adding something small? |
OK, and I assume non-string 0 also works.
Not currently. But it will hang again if I add many files to MFS (with that command) and re-trigger that bug. Happened twice so far as documented by this thread.
No. What I was doing was running a command which automatically ran So that would take some amount of effort and time to do. There's now 384,226 lines in the log file - tail shows that adding files to MFS slowed down at the end of the most recent attempt:
Roughly 300,000 out of 1 million done: if I logged it better then I could say more confidently about whether or not "Reprovider.Interval 0" and "Experimental.StrategicProviding true" helped at all. It seems like it did, but I could try again with better logging this time. (I can always test the thing I'm trying to do but with an empty leveldb datastore, just have to put in the effort to do that.) |
So leveldb does not work well at a large scale. What works better then? I've heard of badgerdb IIRC - maybe that works better. badgerdb is used in an implementation/setup of IPFS. |
I think we already cleared that deleting the leveldb folder does not delete the data (which is in the blocks folder), only the list of pins, the mfs root and the pending-reprovide entries. You should see that this folder should be much smaller in size and you can back it up easily.
Probably not. I would like you that you start with an empty leveldb datastore and the reprovider settings I mentioned, and see if simple things work. If they work then you can try your major operations and when they degrade please get profiles so that we know what the memory is going into. |
Yes, I know. I essentially knew that multiple posts ago and also months ago. The point of the steps posted in #10588 (comment) was to clarify the process that I would need to do in order to test that.
Alright, so string zero it is. |
Ok, I am going to assume it is leveldb. Kubo now supports pebble, so you should be able to replace the leveldb part of the configuration with pebble. You can play with that and let us know if things get out of control again. |
About "I think leveldb is compacting." Seems I know what you mean by that. I did another command where many files are added to MFS. It oom'd the daemon and kept working in offline mode: I let that run for >12 hours. What the datastore folder looks like after I recently canceled that command:
That's 32,003 files totaling to 171 MB. Normally it's roughly 60 files at 50 MB. This should explain why the ipfs daemon takes so long to start up and |
Hello @ProximaNova , can you try to start kubo with I think this will fix the "copying to mfs gets slow" problem. |
Triage notes:
|
This is a mitigation to increased MFS memory usage in the course of many writes operations. The underlying issue is the unbounded growth of the mfs directory cache in boxo. In the latest boxo version, this cache can be cleared by calling Flush() on the folder. In order to trigger that, we call Flush() on the parent folder of the file/folder where the write-operations are happening. To flushing the parent folder allows it to grow unbounded. Then, any read operation to that folder or parents (i.e. stat), will trigger a sync-operation to match the cache to the underlying unixfs structure (and obtain the correct node-cid). This sync operation must visit every item in the cache. When the cache has grown too much, and the underlying unixfs-folder has switched into a HAMT, the operation can take minutes. Thus, we should clear the cache often and the Flush flag is a good indicator that we can let it go. Users can always run with --flush=false and flush at regular intervals during their MFS writes if they want to extract some performance. Fixes #8694, #10588.
See summary: #8694 (comment) |
* cmd/files: flush parent folders This is a mitigation to increased MFS memory usage in the course of many writes operations. The underlying issue is the unbounded growth of the mfs directory cache in boxo. In the latest boxo version, this cache can be cleared by calling Flush() on the folder. In order to trigger that, we call Flush() on the parent folder of the file/folder where the write-operations are happening. To flushing the parent folder allows it to grow unbounded. Then, any read operation to that folder or parents (i.e. stat), will trigger a sync-operation to match the cache to the underlying unixfs structure (and obtain the correct node-cid). This sync operation must visit every item in the cache. When the cache has grown too much, and the underlying unixfs-folder has switched into a HAMT, the operation can take minutes. Thus, we should clear the cache often and the Flush flag is a good indicator that we can let it go. Users can always run with --flush=false and flush at regular intervals during their MFS writes if they want to extract some performance. Fixes #8694, #10588. * cmd/files: docs and changelog for --flush changes
Checklist
Installation method
dist.ipfs.tech or ipfs-update
Version
Config
Description
Things were working fairly fast and OK (not great, but OK), then after a certain event a day ago things got way slower or stopped working. Setup: ZFS mirror pool of two 18 TB HDD which mostly contains IPFS data, like 15 TB of that. Things were working OK because like a month ago pinning stopped working. I saw some error about "cannot fix 1800 pins" or something occasionally. A day ago I was doing this with a list of 1,105,578 IPFS CIDs (totaling to 1.2 TB):
$ cat /home/u/Downloads/t5.txt | xargs -d "\n" sh -c 'for args do cid="$(echo "$args" | sed "s/ .*//g")"; fn="$(echo "$args" | sed "s/.* //g")"; date -u; ipfs files cp -p "/ipfs/$cid" "/dup/organize/4chan/mlp/$(echo "$fn" | perl -pE "s/^(....).*/\1/g")/$(echo "$fn" | perl -pE "s/^....(..).*/\1/g")/$fn"; date -u; done' _ >> /home/u/Downloads/t6.txt
What that command does: the input is many lines where each line is "[raw blocks CID] [Unix timestamp filename]" and each file is 1KB to 4MB in size. That command was running in offline mode yesterday; no ipfs daemon was running. It then puts those files in paths like this: "ipfs files cp -p /ipfs/[cid] /mfs/1419/00/1419000480902.jpg". It logs the timestamp of each "ipfs files cp" command to file "t6.txt".
That was the event which I think messed things up. It did 25,859 operations of copying files to MFS. After I canceled that command 24 or 48 hours ago, I have had persistent problems with my IPFS stuff. Such as the daemon not starting or hanging: ipfs/ipfs-docs#1956 - not a problem anymore. I do have the following problem; adding a small file to IPFS never finishes - this ran for like 30 minutes and didn't exit:
And as said above, pinning doesn't work, so
ipfs --offline pin add --progress bafybeibfcytwdefk2hmatub3ab4wvfyei34xkwqz5ubzrqwslxi3d5ehau
is always stuck as "Fetched/Processed 0 nodes". About the 25,859 operations before it became bad: at the start of the text file you can see that files were copied to MFS quickly and at the end it went way slower:Like a week ago I saw some error about a dirty flag not being cleared. I have attached the output file of "$ ipfs diag profile" for more details. If there's something to be learned from this, I guess it's to not copy many files to MFS without the IPFS daemon running. I was trying to copy more than one million but only copied like 25,000. Also I've seen some weirdness with the "ipfs files" set of commands in the past (copy/move).
Related issue: #10383 titled "Improve data onbaording speed: ipfs add and ipfs dag import|export" (I recommend using raw blocks instead).
The text was updated successfully, but these errors were encountered: