-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: update backup fixtures #140369
base: master
Are you sure you want to change the base?
roachtest: update backup fixtures #140369
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This change depends on #140368 |
feb6183
to
119e642
Compare
Here's a weekly roachtest run: I'm still testing out auth inside teamcity. I think GCE is working now, but I still need to test AWS. |
728eeeb
to
e08386c
Compare
a007b38
to
9874e40
Compare
2f1c0e3
to
5645808
Compare
There is a bug in the fixtures GC that I knew about, but thought would be mostly theoretical. If two copies of the GC run concurrently, one of them may fail because it will attempt to delete a file that was already deleted. It turns out this is actually pretty likely, because the tests run GC as the first action. So when roachtests start up in parallel one of them will often fail during the GC. I'll fix the GC in its own PR and this PR on top of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! left mostly nits
timeout: 1 * time.Hour, | ||
suites: registry.Suites(registry.Nightly), | ||
skip: "only for fixture generation", | ||
timeout: 24 * time.Hour, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how long do both the large and mid size fixtures actually take? I usually set the timeout to be 1.5x to their runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the times to be more inline with test execution. The large test takes 16 hours and medium takes 8 hours.
// of an older fixture. The fields specified here will override any fields | ||
// specified in the scheduledBackupSpecs field above. | ||
initWorkloadViaRestore *restoreSpecs | ||
fixture TpccFixture | ||
|
||
timeout time.Duration | ||
// A no-op, used only to set larger timeouts due roachtests limiting timeouts based on the suite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this comment. is it referring to the timeout field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this comment. This comment was from the old version of the code. It previously made sense since fixture generation wasn't hooked up to either of the roachtest suites, so it only affected how long the timeout could be.
t.Fatalf("fixtures not supported on %s", c.Cloud()) | ||
} | ||
|
||
uri.Path = path.Join(uri.Path, "roachtest/v25.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead the subfolder we merge is "master". Then, we can put in a check that if the test runs on a release branch, it fails if the subfolder is master. This would trigger a failure every time a new branch is cut, which would remind us to trivially backport a dir change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! This was a miss on my part. I intended to have this auto-magically set based on the actual roachprod version, but forgot.
t: t, | ||
c: c, | ||
sp: bf, | ||
version: clusterupgrade.CurrentVersion(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your change could allow us to remove the manual binary version upload logic and remove the version
field from the backupDriver. Previously, i needed to create a 24.1 fixture on a roachtest running on master. I don't think we need this with your change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Done.
require.NoError(bd.t, err) | ||
|
||
cmd := roachtestutil.NewCommand("./cockroach workload fixtures import tpcc"). | ||
Arg("%q", urls[0]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could tpcc init be faster if it connects across several nodes? like the c2c tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. The workload fixtures import
command works by creating an import job for each TPCC table. So the workload -> db connections don't carry any of the import data.
INTO '%s' | ||
RECURRING '* * * * *' | ||
FULL BACKUP '@weekly' | ||
WITH SCHEDULE OPTIONS first_run = 'now', ignore_existing_backups; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the ignore_existing_backups
option necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. An early version of this PR didn't have the fixture registry and tried to use SQL to inspect and manage backups in a single shared directory. The pain of trying to clean up leaked fixtures in a single shared director is what lead me to the current fixture registry implementation.
sql.QueryRow(bd.t, backupCountQuery).Scan(&backupCount) | ||
bd.t.L().Printf(`%d scheduled backups taken`, backupCount) | ||
if backupCount >= bd.sp.scheduledBackupSpecs.numBackupsInChain { | ||
if backupCount >= bd.sp.fixture.IncrementalChainLength { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use sql.Exec for the pauseSchedulesQuery below that i wrote :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
c cluster.Cluster | ||
version *clusterupgrade.Version | ||
fixture blobfixture.FixtureMetadata | ||
registry *blobfixture.Registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems the backupDriver
only needs registry.URI()
. Perhaps bind the uri field instead of the whole registry object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like binding the whole object. It makes find all references and go to definition work correctly.
c0fd4fb
to
432d77a
Compare
I think I'm happy with the state of this now. My plan is to tune down the workload on the 20 Tib fixture until its no longer overloading the cluster or ooming the workload node. |
0fafe32
to
bbb3b15
Compare
Previously, backup fixture creation was a manual process. Now, fixture creation will run as a roachtest and use the fixture registry utility added by cockroachdb#140368. Fixture generation was reworked to use import instead of old backup fixtures. This makes it trivial to bootstrap fixtures in new clouds and in new fixture buckets. Release note: none Part of: cockroachdb#139159
bbb3b15
to
41b2df8
Compare
Previously, backup fixture creation was a manual process. Now, fixture
creation will run as a roachtest and use the fixture registry utility
added by #140368.
Fixture generation was reworked to use import instead of old backup
fixtures. This makes it trivial to bootstrap fixtures in new clouds and
in new fixture buckets.
Release note: none
Part of: #139159