Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: failures in test_timeline_archival_chaos #10389

Open
4 of 5 tasks
jcsp opened this issue Jan 14, 2025 · 4 comments
Open
4 of 5 tasks

tests: failures in test_timeline_archival_chaos #10389

jcsp opened this issue Jan 14, 2025 · 4 comments
Assignees
Labels
a/test Area: related to testing c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug triaged bugs that were already triaged

Comments

@jcsp
Copy link
Collaborator

jcsp commented Jan 14, 2025

Multiple failure modes:

@jcsp jcsp added a/test Area: related to testing c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug labels Jan 14, 2025
@jcsp jcsp self-assigned this Jan 14, 2025
@erikgrinaker erikgrinaker added the triaged bugs that were already triaged label Jan 21, 2025
@jcsp
Copy link
Collaborator Author

jcsp commented Jan 27, 2025

Failure mode C: #10524

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 28, 2025

Failure mode D: #10532

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 30, 2025

Failed mode E: #10594

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 30, 2025

Failure mode B: #10595

github-merge-queue bot pushed a commit that referenced this issue Jan 30, 2025
## Problem

The test asserts that it completes at least 10 full timeline lifecycles,
but the noisy CI environment sometimes doesn't meet that goal.

Related: #10389

## Summary of changes

- Sleep for longer between pageserver restarts, so that the timeline
workers have more chance to make progress
- Sleep for shorter between retries from timeline worker, so that they
have better chance to get in while a pageserver is up between restarts
- Relax the success condition to complete at least 5 iterations instead
of 10
github-merge-queue bot pushed a commit that referenced this issue Jan 31, 2025
…10594)

## Problem

If offloading races with normal shutdown, we get a "failed to freeze and
flush: cannot flush frozen layers when flush_loop is not running, state
is Exited". This is harmless but points to it being quite strange to try
and freeze and flush such a timeline. flushing on shutdown for an
archived timeline isn't useful.

Related: #10389

## Summary of changes

- During Timeline::shutdown, ignore ShutdownMode::FreezeAndFlush if the
timeline is archived
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/test Area: related to testing c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug triaged bugs that were already triaged
Projects
None yet
Development

No branches or pull requests

2 participants