Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration 11-12: migrate CIDsv1 to raw multihashes #95

Merged
merged 13 commits into from
Nov 17, 2021

Conversation

hsanjuan
Copy link
Contributor

@hsanjuan hsanjuan commented Jan 31, 2020

Part of epic: ipfs/kubo#6815

This adds the migration to convert CIDsV1 in the datastore "/blocks" namespace to raw multihashes.

Testing is missing, but initial testing on my local repo (~5000 of such CIDs) seems to work on both flatfs (slow) and badger (really fast).

To be rebased on #94 . More comments inlined.

@hsanjuan hsanjuan requested a review from Stebalien January 31, 2020 10:34
@hsanjuan hsanjuan self-assigned this Jan 31, 2020
@hsanjuan hsanjuan force-pushed the feat/repo-migration-raw-multihashes branch from 823c99f to 4658cb0 Compare January 31, 2020 10:35
Copy link
Contributor Author

@hsanjuan hsanjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments and open questions

ipfs-8-to-9/migration/go.mod Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/go.mod Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/swapper.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/swapper.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/swapper.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@hsanjuan hsanjuan force-pushed the feat/repo-migration-raw-multihashes branch from 4658cb0 to b79320a Compare January 31, 2020 19:36
@Stebalien
Copy link
Member

(you checked in a binary)

ipfs-8-to-9/migration/go.mod Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/swapper.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/swapper.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/swapper.go Outdated Show resolved Hide resolved
@hsanjuan hsanjuan force-pushed the feat/repo-migration-raw-multihashes branch from 95cbf1b to d330dcf Compare February 6, 2020 18:59
Copy link
Contributor Author

@hsanjuan hsanjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(removed binary file that was checked in)

ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
github.com/ipfs/go-cid v0.0.5
github.com/ipfs/go-datastore v0.3.1
github.com/ipfs/go-filestore v0.0.4-0.20200203163551-8cc45055f624 // indirect
github.com/ipfs/go-ipfs v0.4.22-0.20200130064341-6750ee973e2a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: link to newer after ipfs/kubo#6816 has the right dep-set and is green.

ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
ipfs-8-to-9/migration/migration.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

So, I have some bad news, reverts going to get more complicated.

If we already have a block as CIDv1 and CIDv0, the revert will delete the CIDv0. We can easily fix this by either not deleting on revert or by checking to see if we already have the block before marking it as something we've swapped.

Unfortunately, the second issue is much worse: If we upgrade, add some files with CIDv1, then downgrade, we'll GC these files. It'll still be possible to read the files given that we check both V0 and V1 CIDs on read. The only solution I can think of is to revert by walking the pinset/mfs. If we do that, with https://github.com/ipfs/go-ipfs/blob/master/thirdparty/cidv0v1/blockstore.go, we'll avoid GCing anything we want to keep.

Thoughts? I'd rather not say "thou shalt not revert" given all the other features and potential issues with this release, but this is getting to be a very complicated process.

@hsanjuan
Copy link
Contributor Author

hsanjuan commented Feb 8, 2020

If we already have a block as CIDv1 and CIDv0, the revert will delete the CIDv0. We can easily fix this by either not deleting on revert or by checking to see if we already have the block before marking it as something we've swapped.

ok, doable somehow. We can tag certain migrations as "keep" and don't delete the blocks during revert.

Unfortunately, the second issue is much worse...

This is getting hell-ish. The GC code (I assume) has the necessary code to list all the CIDs under pinset and mfs, so maybe it is not overly complicated to add the revert of those blocks as an extra step for the revert process?

@Stebalien
Copy link
Member

ok, doable somehow. We can tag certain migrations as "keep" and don't delete the blocks during revert.

SGTM.

This is getting hell-ish.

I agree. I'm starting to think we should put this in its own release instead of squeezing it into 0.5.0. On the other hand, I know the browsers team has been looking forward to this.

The GC code (I assume) has the necessary code to list all the CIDs under pinset and mfs, so maybe it is not overly complicated to add the revert of those blocks as an extra step for the revert process?

Yeah, we'd need to use the GC coloring code to get all "live" blocks.

@ShadowJonathan
Copy link

@Stebalien Any progress on this? I came to look up progress on this change from ipfs/kubo#6815

@Stebalien
Copy link
Member

js-ipfs moved forward with their migration: ipfs/js-ipfs#3124. The go team doesn't have anyone to work on this at the moment.

@hsanjuan hsanjuan force-pushed the feat/repo-migration-raw-multihashes branch from b5db12d to 704c741 Compare April 20, 2021 14:46
@hsanjuan
Copy link
Contributor Author

I've brought the branch up to date and will continue where I left off in the good old days.

@hsanjuan hsanjuan changed the title Migration 8-9: migrate CIDsv1 to raw multihashes Migration 11-12: migrate CIDsv1 to raw multihashes Apr 20, 2021
@hsanjuan hsanjuan force-pushed the feat/repo-migration-raw-multihashes branch from 704c741 to 89c5b50 Compare April 20, 2021 15:02
In this case, we need to keep them around during the revert.

We do this by pre-pending + to the list.

When doing reverts, we do not delete raw multihashes that already existed.

This happens only if the user had both the CidV1 and  the CidV0 for something.
@hsanjuan
Copy link
Contributor Author

If we upgrade, add some files with CIDv1, then downgrade, we'll GC these files. It'll still be possible to read the files given that we check both V0 and V1 CIDs on read. The only solution I can think of is to revert by walking the pinset/mfs. If we do that, with https://github.com/ipfs/go-ipfs/blob/master/thirdparty/cidv0v1/blockstore.go, we'll avoid GCing anything we want to keep.

This is not addressed yet, but I don't remember if we wanted to address it because the migration was meant to happen on a big go-ipfs release (with high chances for regressions and downgrades). This happens when the user downgrades after having added CIDv1-addressed content, in which case, those "new" blocks would stay referenced by multihash in the datastore (as CIDv0). They could still be read, because the blockstore is wrapped insomething that tries reading CIDv1 but fails over to CIDv0, but if a GC runs at that point, they could be GC'ed.

@Stebalien @aschmahmann is this something we still want to address?

The proposed solution is to:

  • Walk all pins and mfs root recursively and collect all those CIDs before revert.
  • Make sure that all those are stored as CID when the revert happens.

I'm pretty sure there are corner cases still: i.e. if the user, post migration, adds something with CidV0 to MFS, then adds the same with CidV1 and pins. It is likely that we will be removing the CIDv0 block unless we have an additional check to see if the equivalent to a CIDv1 is in the pin/mfs trees as CIDv0. This will add more memory pressure or we have to opt for a bloom filter check.

@hsanjuan
Copy link
Contributor Author

I'm pretty sure there are corner cases still: i.e. if the user, post migration, adds something with CidV0 to MFS, then adds the same with CidV1 and pins. It is likely that we will be removing the CIDv0 block unless we have an additional check to see if the equivalent to a CIDv1 is in the pin/mfs trees as CIDv0. This will add more memory pressure or we have to opt for a bloom filter check.

Ok, what we can do is to revert everything from the backup file and all the CIDv1 pins from the pinset and mfs and leave the raw-multihash blocks in place (do not delete anything on revert). That would cover the corner cases of the migration deleting something is should not have deleted on revert, it is simple and the user can choose to GC after revert. The downside is the repo size will potentially grow during revert. Note that we are always talking about CIDv1 stuff (not the default).

This commit ensure that we traverse all the CIDs in the pinset and MFS trees
and revert them to cid-based addressing in the datastore when the migration is
reversed. This addresses the problem when a user has added new CIDv1 pins or
modified MFS with CIDv1 content before reverting the migration. In that case,
those new multihash-addressed blocks would stay as such and while they could
potentially still be read, they would be swiped out by GC in the older version
of the code.

Additionally, there are corner cases when content is added with both CIDv1 and
CIDv0 (but the same multihash). These were partially addressed already by
keeping track of CIDv0 blocks that existed already for CIDv1 equivalents, and
by some logic during reverts, to handle blocks referenced from several
CIDv1s. However, things like adding CIDv1 and CIDv0 equivalent blocks to the
pinset after migration would not be handled properly.

To address all these issues, and simplify, we have opted to not delete any
raw-multihash-addressed block during revert. We convert as much as we can to
CIDv1 (using the backup file and the pinset/mfs trees), but otherwise leave
all the blocks. This simplifies the code and makes reverts more
memory-friendly at the expense of extra disk space.

As a final reminder, CIDv1s are not the default. We can expect that in the
general case, people have content that is CIDv0 addressed and requires no
action whatsoever in both migrating forward and backwards.
@hsanjuan
Copy link
Contributor Author

The last commit walks the MFS and pinset trees and ensures all those blocks are reverted to CIDv1 addressing.

Also, we do not delete any blocks on revert anymore. There are many corner cases about deleting blocks that were indeed expected to be there after migration because they had both CIDv1 and CIDv0.

The last commit mostly reverts the previous one as it would not be necessary to track which blocks had both CIDv1 and CIDv0 addressing if we do not delete them on revert.

@aschmahmann
Copy link
Contributor

@hsanjuan does this now mean that reversions are safe (even in the CIDv1 cases), you'll just end up with more data you'll have to GC?

@hsanjuan
Copy link
Contributor Author

I have added tests.

@gammazero it seems this wants me to publish the new migration to dist.ipfs.io before it can successfully check that fs-repo-migration -v is 12. Either tests are bound to break on branches or bound to break when the new migration is published. What is the right approach?

@hsanjuan hsanjuan requested a review from aschmahmann April 26, 2021 13:40
@hsanjuan
Copy link
Contributor Author

It may be impossible to review when the vendor repository is added, which breaks tests when not there.

@hsanjuan
Copy link
Contributor Author

One thing to double-check is whether it is ok to use ColoredSet() from go-ipfs v0.8.0 for the reverts (probably yes because it does not convert CIDs).

@hsanjuan
Copy link
Contributor Author

Afaik tests fail because there were broken in master after #130

@gammazero
Copy link
Contributor

The tests appear to be passing now. Maybe network issue in CI?

@hsanjuan
Copy link
Contributor Author

hm thanks for checking @gammazero !

@hsanjuan hsanjuan force-pushed the feat/repo-migration-raw-multihashes branch from 3c2d9b1 to 2ea97fd Compare April 30, 2021 18:20
@hsanjuan
Copy link
Contributor Author

Dependency vendoring is in #131

@BigLep BigLep added P2 Medium: Good to have, but can wait until someone steps up status/blocked Unable to be worked further until needs are met labels May 17, 2021
@BigLep
Copy link
Contributor

BigLep commented May 17, 2021

Blocked until the release (ipfs/kubo#8058)

@hsanjuan
Copy link
Contributor Author

I have been testing the migration using a 2TB badger repository on a AWS EC2 machine, with CPU/16GB RAM and 15TB EBS storage (IO optimized).

This repository is additionally full of CIDv1s (2M+ blocks need migration). This takes about a day or two.

  • A badger this big needs to bumped FD limits. That is normal and should be ok on a production setup.
  • On a first attempt, everything was going well but the processed suffered an OOM event at 200k . I added pprof debugging and started extracting memory heaps for a second attempt. This is currently at block 1.200M and going fine. My suspect for OOM is badger though, not anything in the migration, given the profiles that it is producing.
  • After that, need to verify that a new migration detects no blocks to be migrated, and that a revert also works without issues.

@gammazero
Copy link
Contributor

Looks like CI failure due to needing to re-vendor.

@BigLep
Copy link
Contributor

BigLep commented Jul 1, 2021

2021-07-01 conversation notes:

  • nft.storage is the most problematic edge cases
  • We could look at our gateways to see how many root CIDs on our gateways or v0 vs. v1
  • Potential "cheat" for FlatFS (move files instead of add/delete)

…-vendored

Migration 11-12: vendor dependencies
@hsanjuan hsanjuan merged commit 536f5d5 into master Nov 17, 2021
@hsanjuan hsanjuan deleted the feat/repo-migration-raw-multihashes branch November 17, 2021 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium: Good to have, but can wait until someone steps up status/blocked Unable to be worked further until needs are met
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants