Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: update backup fixtures #140369

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jeffswenson
Copy link
Collaborator

Previously, backup fixture creation was a manual process. Now, fixture
creation will run as a roachtest and use the fixture registry utility
added by #140368.

Fixture generation was reworked to use import instead of old backup
fixtures. This makes it trivial to bootstrap fixtures in new clouds and
in new fixture buckets.

Release note: none
Part of: #139159

@jeffswenson jeffswenson requested review from a team as code owners February 3, 2025 13:26
@jeffswenson jeffswenson requested review from herkolategan, DarrylWong and msbutler and removed request for a team February 3, 2025 13:26
Copy link

blathers-crl bot commented Feb 3, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@jeffswenson
Copy link
Collaborator Author

jeffswenson commented Feb 3, 2025

This change depends on #140368

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jeffswenson
Copy link
Collaborator Author

@jeffswenson jeffswenson force-pushed the jeffswenson-update-backup-fixtures branch 5 times, most recently from feb6183 to 119e642 Compare February 4, 2025 20:23
@jeffswenson
Copy link
Collaborator Author

Here's a weekly roachtest run:
https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestWeeklyBazel/18727628

I'm still testing out auth inside teamcity. I think GCE is working now, but I still need to test AWS.

@jeffswenson jeffswenson force-pushed the jeffswenson-update-backup-fixtures branch 2 times, most recently from 728eeeb to e08386c Compare February 4, 2025 22:10
@jeffswenson
Copy link
Collaborator Author

jeffswenson commented Feb 4, 2025

@jeffswenson jeffswenson force-pushed the jeffswenson-update-backup-fixtures branch 3 times, most recently from a007b38 to 9874e40 Compare February 5, 2025 14:31
@jeffswenson jeffswenson requested a review from a team as a code owner February 5, 2025 14:31
@jeffswenson jeffswenson force-pushed the jeffswenson-update-backup-fixtures branch 6 times, most recently from 2f1c0e3 to 5645808 Compare February 5, 2025 19:50
@jeffswenson
Copy link
Collaborator Author

There is a bug in the fixtures GC that I knew about, but thought would be mostly theoretical. If two copies of the GC run concurrently, one of them may fail because it will attempt to delete a file that was already deleted.

It turns out this is actually pretty likely, because the tests run GC as the first action. So when roachtests start up in parallel one of them will often fail during the GC. I'll fix the GC in its own PR and this PR on top of it.

Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! left mostly nits

timeout: 1 * time.Hour,
suites: registry.Suites(registry.Nightly),
skip: "only for fixture generation",
timeout: 24 * time.Hour,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long do both the large and mid size fixtures actually take? I usually set the timeout to be 1.5x to their runtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the times to be more inline with test execution. The large test takes 16 hours and medium takes 8 hours.

// of an older fixture. The fields specified here will override any fields
// specified in the scheduledBackupSpecs field above.
initWorkloadViaRestore *restoreSpecs
fixture TpccFixture

timeout time.Duration
// A no-op, used only to set larger timeouts due roachtests limiting timeouts based on the suite
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this comment. is it referring to the timeout field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this comment. This comment was from the old version of the code. It previously made sense since fixture generation wasn't hooked up to either of the roachtest suites, so it only affected how long the timeout could be.

t.Fatalf("fixtures not supported on %s", c.Cloud())
}

uri.Path = path.Join(uri.Path, "roachtest/v25.1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead the subfolder we merge is "master". Then, we can put in a check that if the test runs on a release branch, it fails if the subfolder is master. This would trigger a failure every time a new branch is cut, which would remind us to trivially backport a dir change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! This was a miss on my part. I intended to have this auto-magically set based on the actual roachprod version, but forgot.

t: t,
c: c,
sp: bf,
version: clusterupgrade.CurrentVersion(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your change could allow us to remove the manual binary version upload logic and remove the version field from the backupDriver. Previously, i needed to create a 24.1 fixture on a roachtest running on master. I don't think we need this with your change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Done.

require.NoError(bd.t, err)

cmd := roachtestutil.NewCommand("./cockroach workload fixtures import tpcc").
Arg("%q", urls[0]).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could tpcc init be faster if it connects across several nodes? like the c2c tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. The workload fixtures import command works by creating an import job for each TPCC table. So the workload -> db connections don't carry any of the import data.

INTO '%s'
RECURRING '* * * * *'
FULL BACKUP '@weekly'
WITH SCHEDULE OPTIONS first_run = 'now', ignore_existing_backups;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the ignore_existing_backups option necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. An early version of this PR didn't have the fixture registry and tried to use SQL to inspect and manage backups in a single shared directory. The pain of trying to clean up leaked fixtures in a single shared director is what lead me to the current fixture registry implementation.

sql.QueryRow(bd.t, backupCountQuery).Scan(&backupCount)
bd.t.L().Printf(`%d scheduled backups taken`, backupCount)
if backupCount >= bd.sp.scheduledBackupSpecs.numBackupsInChain {
if backupCount >= bd.sp.fixture.IncrementalChainLength {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use sql.Exec for the pauseSchedulesQuery below that i wrote :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

c cluster.Cluster
version *clusterupgrade.Version
fixture blobfixture.FixtureMetadata
registry *blobfixture.Registry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems the backupDriver only needs registry.URI(). Perhaps bind the uri field instead of the whole registry object?

Copy link
Collaborator Author

@jeffswenson jeffswenson Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like binding the whole object. It makes find all references and go to definition work correctly.

@jeffswenson jeffswenson force-pushed the jeffswenson-update-backup-fixtures branch 9 times, most recently from c0fd4fb to 432d77a Compare February 12, 2025 19:08
@jeffswenson
Copy link
Collaborator Author

I think I'm happy with the state of this now. My plan is to tune down the workload on the 20 Tib fixture until its no longer overloading the cluster or ooming the workload node.

@jeffswenson jeffswenson force-pushed the jeffswenson-update-backup-fixtures branch 3 times, most recently from 0fafe32 to bbb3b15 Compare February 14, 2025 16:58
Previously, backup fixture creation was a manual process. Now, fixture
creation will run as a roachtest and use the fixture registry utility
added by cockroachdb#140368.

Fixture generation was reworked to use import instead of old backup
fixtures. This makes it trivial to bootstrap fixtures in new clouds and
in new fixture buckets.

Release note: none
Part of: cockroachdb#139159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants