Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StorageID to Repository entity #8502

Merged
merged 16 commits into from
Jan 18, 2025
Merged

Conversation

itaigilo
Copy link
Contributor

Closes #8495.

@itaigilo itaigilo added the exclude-changelog PR description should not be included in next release changelog label Jan 15, 2025
Copy link

github-actions bot commented Jan 15, 2025

♻️ PR Preview 6d2d2e7 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link

E2E Test Results - Quickstart

11 passed

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

@itaigilo itaigilo closed this Jan 15, 2025
@itaigilo itaigilo reopened this Jan 15, 2025
@itaigilo itaigilo marked this pull request as ready for review January 15, 2025 20:39
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work.
I have some comments that I think require our attention before approval.
As well as missing unit tests for graveler/catalog level

@@ -60,7 +60,8 @@ func GetBasicHandler(t *testing.T, authService *FakeAuthService, repoName string
storageNamespace = "replay"
}

_, err = c.CreateRepository(ctx, repoName, storageNamespace, "main", false)
// TODO (gilo): test storageID here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain? This is a test utility - what would you like to test here?

panic(ErrInvalidType)
}

// TODO (gilo): Any other validations?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think this validation is required at all at this point. The storage ids already exist in the lakefs config so the parameter either matches one of them or it doesn't

We should however definitely define the requirements for a valid storage_id string and enforce during configuration load.
Lets just decide on min+max length + allowed chars and coordinate with @treeverse/product on this.
We should open an appropriate issue to enforce this validation.

storageNS := graveler.StorageNamespace(storageNamespace)
branchID := graveler.BranchID(branch)
if err := validator.Validate([]validator.ValidateArg{
{Name: "name", Value: repositoryID, Fn: graveler.ValidateRepositoryID},
{Name: "storageID", Value: storageIdentifier, Fn: graveler.ValidateStorageID},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for a validation on storage_id it's a given in the configuration at this point

@@ -2010,7 +2010,7 @@ func (c *Controller) CreateRepository(w http.ResponseWriter, r *http.Request, bo
if swag.BoolValue(params.Bare) {
// create a bare repository. This is useful in conjunction with refs-restore to create a copy
// of another repository by e.g. copying the _lakefs/ directory and restoring its refs
repo, err := c.Catalog.CreateBareRepository(ctx, body.Name, body.StorageNamespace, defaultBranch, swag.BoolValue(body.ReadOnly))
repo, err := c.Catalog.CreateBareRepository(ctx, body.Name, "", body.StorageNamespace, defaultBranch, swag.BoolValue(body.ReadOnly))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since only the blockstore is aware of the current configuration (using either blockstore or blockstores), we should have a validation at the adapter level which checks whether this value is "Valid". For OSS the validation should allow empty value (or enforce it - I don't really know if we should allow providing any storage_id in that case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.
Removed this validation.

Copy link
Contributor Author

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z validation removed.
PTAL again.

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some more comments

@@ -456,12 +457,13 @@ func (c *Catalog) CreateRepository(ctx context.Context, repository string, stora
}); err != nil {
return nil, err
}
repo, err := c.Store.CreateRepository(ctx, repositoryID, storageNS, branchID, readOnly)
repo, err := c.Store.CreateRepository(ctx, repositoryID, storageIdentifier, storageNS, branchID, readOnly)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous comment and also discussed with @guy-har.
We should probably have someone validate the storage_id here.
Either by a dedicated method in the adapter or by getting the list of block configurations.
In the OSS this should lead to an error whenever create repository is provided with a storageID which is not empty.
We cannot merge the changes in the current state since this could lead to potential bugs in case we release a lakeFS version during development.
If we want to defer the decision of how we want to handle this, the minimum we need to do is to hardcode the current behavior so that any value other than empty will return error, and not to allow creating repositories with explicit storage IDs ATM

testutil.Must(t, err)
_, err = deps.catalog.CreateRepository(ctx, "foo2", onBlock(deps, "foo1"), "main", false)
_, err = deps.catalog.CreateRepository(ctx, "foo2", "storage", onBlock(deps, "foo1"), "main", false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use empty storageID for controller tests ATM

@@ -72,12 +72,12 @@ func TestCatalog_ListRepositories(t *testing.T) {
// prepare data tests
now := time.Now()
gravelerData := []*graveler.RepositoryRecord{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to have one record with no StorageID (empty) as part of our tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Added.

@@ -871,6 +871,7 @@ func createPrepareUncommittedTestScenario(t *testing.T, repositoryID string, num
repository := &graveler.RepositoryRecord{
RepositoryID: graveler.RepositoryID(repositoryID),
Repository: &graveler.Repository{
StorageID: "storage",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change relevant at the moment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this row and it will still work -
But I rather keep this as an extra test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove that - and address these tests in the right context (GC).
I prefer we don't modify the test without understanding what we'd want to test here

Copy link
Contributor Author

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments.

@N-o-Z can you PTAL again?

@@ -72,12 +72,12 @@ func TestCatalog_ListRepositories(t *testing.T) {
// prepare data tests
now := time.Now()
gravelerData := []*graveler.RepositoryRecord{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Added.

@@ -871,6 +871,7 @@ func createPrepareUncommittedTestScenario(t *testing.T, repositoryID string, num
repository := &graveler.RepositoryRecord{
RepositoryID: graveler.RepositoryID(repositoryID),
Repository: &graveler.Repository{
StorageID: "storage",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this row and it will still work -
But I rather keep this as an extra test.

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!
Thanks for the hard work!
I'm approving but I guess we still have some decisions to make here

* Add StorageID to Repo endpoints

* Add param to Creation

* Add basic unit-tests

* Fix param

* Update tests
@itaigilo itaigilo merged commit bca4801 into master Jan 18, 2025
40 checks passed
@itaigilo itaigilo deleted the feature/add-storage-id-to-repo branch January 18, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add StorageID to Repository entity
2 participants