Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: atrium-repo #272

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

feat: atrium-repo #272

wants to merge 39 commits into from

Conversation

DrChat
Copy link

@DrChat DrChat commented Dec 23, 2024

This builds on str4d's work here: #168.

Big notes:

  • Complete merkle search tree implementation (but still lacking fuzz tests)
  • Added a new pair of traits for block storage: AsyncBlockStoreRead and AsyncBlockStoreWrite.
    • In-memory storage backend (primarily intended for testing)
    • CAR storage (supporting both reading and writing)
    • Users can implement other storage sources, like Azure-backed block storage.

  • Merkle Search Tree
    • Parse Node
      • Read Node from bytes
      • Verify depth and sort order
      • Limit the number of TreeEntrys per Node to a statistically unlikely maximum length.
      • Consider limiting the overall depth of the repo, or other parameters, to prevent more sophisticated key mining attacks.
    • Locate key within node
    • Locate keys within node with a given prefix
    • Add entry
    • Edit entry
    • Delete entry
  • Repository
    • Load from a CAR file
    • Load from a firehose record
    • Enumerate all keys
    • Fetch all records in a given collection
    • Get a specific record
  • Storage
    • CAR files
      • Reading CAR files
      • Verify completeness of the repository structure.
      • Robustness to both duplication and de-duplication of blocks.
      • Ignore any unnecessary or unlinked blocks.
    • Commits from firehose records

@DrChat DrChat marked this pull request as draft December 23, 2024 20:14
@DrChat DrChat force-pushed the repo branch 5 times, most recently from 4b54d5e to 3b84377 Compare December 26, 2024 20:54
loop {
let node = Node::read_from(&mut bs, node_cid).await?;
if !seen.insert(node_cid) {
// This CID was already seen. There is a cycle in the graph.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even possible to serialize a graph with a cycle?
The two nodes would point at each other, and by definition it should be impossible to compute their CIDs if there is a cycle.

DrChat added 6 commits January 3, 2025 16:08
Why: This function is a footgun that will likely be misused.
Deleting a block is not typically safe because it could have multiple incoming references (and thus deletion converts those into dangling references).
The only way to safely delete a block would be to wholistically enumerate an archive file and collect a list of all referenced blocks.
@DrChat
Copy link
Author

DrChat commented Jan 18, 2025

I think the PR is in a pretty good state for a MVP.
My time is limited so I won't be able to quickly iterate on the functionality for creating new commits for a repository. Still planning on it, but the work here can be pushed up in the meantime :)

@DrChat DrChat marked this pull request as ready for review January 18, 2025 22:24
@erlend-sh
Copy link
Contributor

erlend-sh commented Jan 19, 2025

Pinging @str4d since this builds on their prior work.

@sugyan sugyan self-requested a review January 19, 2025 13:48
Copy link
Owner

@sugyan sugyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written several comments on areas unrelated to the internal logic. There are also a couple of points from clippy that I hope you can correct as well.

@@ -12,5 +12,8 @@ target/
# These are backup files generated by rustfmt
**/*.rs.bk

# VSCode settings
/.vscode
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to include anything related to a specific editor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this line because VSCode annoyingly creates this folder automatically (likely due to the Copilot extension being broken and populating a settings.json).
If it isn't present, it's almost certain that I or someone else may accidentally commit this folder.

If you don't want to have a line that explicitly mentions VSCode, another alternative you could consider is changing this .gitignore file to an allowlist.
Here's an example from one of my other projects.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each individual's settings related to the editor should have their own global ignore settings, and I don't want to include changes on the repository side for that.

According to man gitignore:

Patterns which a user wants Git to ignore in all situations (e.g., backup or temporary files generated by the user’s editor of choice) generally go into a file specified by core.excludesFile in the user’s ~/.gitconfig. Its default value is $XDG_CONFIG_HOME/git/ignore. If $XDG_CONFIG_HOME is either not set or empty, $HOME/.config/git/ignore is used instead.

I believe that editor related ignore settings should be written there.

"examples/concurrent",
"examples/firehose",
"examples/video",
# "examples/concurrent",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why these are no longer excluded?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I had forgotten about this.
I changed this because the examples were written against the last released version of atrium rather than the version present in the repository.
I needed to update the firehose example to incorporate the changes I have made here, and chose to change its dependencies to point at the crates in this repository.

The decision to have examples demonstrate the last public release seems to be an unusual one. I haven't ever seen this be done for any other Rust library.
Any thoughts about changing this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The current policy of using the latest release version does not allow us to use newly added packages.

We've discussed this before and made it the way it is now,
#100 (comment)

But maybe we should reconsider.

[package]
name = "atrium-repo"
version = "0.0.0"
authors = ["Jack Grigg <[email protected]>"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to include both of your names.


# async in traits
# Can be removed once MSRV is at least 1.75.0.
async-trait = "0.1.68"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has already been removed by raising the MSRV.

[![](https://img.shields.io/crates/v/atrium-repo)](https://crates.io/crates/atrium-repo)
[![](https://img.shields.io/docsrs/atrium-repo)](https://docs.rs/atrium-repo)
[![](https://img.shields.io/crates/l/atrium-repo)](https://github.com/sugyan/atrium/blob/main/LICENSE)
[![Rust](https://github.com/sugyan/atrium/actions/workflows/api.yml/badge.svg?branch=main)](https://github.com/sugyan/atrium/actions/workflows/api.yml)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this package also contains tests, you should also add workflows/repo.yml and this should be a link to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants