Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resplit car files based on subsets #107

Open
1 of 2 tasks
anjor opened this issue Jun 19, 2024 · 2 comments
Open
1 of 2 tasks

Resplit car files based on subsets #107

anjor opened this issue Jun 19, 2024 · 2 comments
Assignees

Comments

@anjor
Copy link
Contributor

anjor commented Jun 19, 2024

Today we generate one large car file based on epoch. This is about ~600GB in size. We then split it using carlet. Carlet does splitting in a naive way where it takes one block (block here is in the IPLD sense of the word) at a time till it reaches the desired size. As a result, blocks belonging to the same dag and "connected" to eachother could be stored in separate CAR files and as a result in different filecoin deals.

This means when we try to retrieve data, only a retrieval protocol that fetches 1 block at a time i.e., bitswap, would work. This means all the separate split CAR files need to be stored with SPs who are serving data over bitswap.

However, we already have introduced the concept of a subset which collects a bunch of Blocks (in the solana sense of the word) together. Since we control which Blocks go in a subset, we could instead split the subsets in a way where each subset is <32GB and will fit in a filecoin sector. This way we have all the data for a subset in a single deal and now retrievable via bitswap as well as graphsync.

This would require the following work:

  • Rewrite the car writing code in faithful to generate subset of "correct" size
  • Write a standalone tool that can take an existing epoch CAR file and split them into "correct" sized smaller car files where each smaller car file is a subset.
@anjor anjor self-assigned this Jun 19, 2024
@anjor
Copy link
Contributor Author

anjor commented Jul 5, 2024

#116 closes this

@anjor
Copy link
Contributor Author

anjor commented Jul 8, 2024

We decided that this is not required:

Rewrite the car writing code in faithful to generate subset of "correct" size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant