-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast import: allow restore to provided connection string #10407
base: main
Are you sure you want to change the base?
Conversation
…rce_connection_string param)
70d43af
to
4a8ff6d
Compare
7436 tests run: 7085 passed, 0 failed, 351 skipped (full report)Flaky tests (3)Postgres 17
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
cd757ff at 2025-02-04T10:18:59.193Z :recycle: |
…unning postgres if specified
4a8ff6d
to
0e40544
Compare
32ed28a
to
01604b3
Compare
why not put the restore connection string into the spec? Shouldn't it be encoded the same way as the source string? |
01604b3
to
6cd4b9c
Compare
I have done it both ways as with source connection strings, there are tests on both paths as well |
@@ -347,6 +350,104 @@ def test_fast_import_binary( | |||
assert res[0][0] == 10 | |||
|
|||
|
|||
def test_fast_import_restore_to_connstring( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the same test as test_fast_import
above. Can we parametrise the previous test instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are similar only in first few lines, then we either run postgres ourselves and restore into it (test_fast_import_restore_to_connstring
) or fast import runs its own postgres locally (test_fast_import_binary
). These scenarious are different, I am not sure it is worth combining them into one test, I am not even sure than combined test will take less lines than both of these 😄
{ | ||
nix::sys::signal::kill( | ||
Pid::from_raw(i32::try_from(proc.id().unwrap()).expect("convert child pid to i32")), | ||
nix::sys::signal::SIGTERM, | ||
) | ||
.context("signal postgres to shut down")?; | ||
proc.wait() | ||
.await | ||
.context("wait for postgres to shut down")?; | ||
} | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra scope? You could instead do. Also, is the return code not interesting?
199 async fn shutdown(&mut self) -> Result<(), anyhow::Error> {
200 let proc: &mut tokio::process::Child = self.postgres_proc.as_mut().unwrap();
201 info!("shutdown postgres");
202 nix::sys::signal::kill(
203 Pid::from_raw(i32::try_from(proc.id().unwrap()).expect("convert child pid to i32")),
204 nix::sys::signal::SIGTERM,
205 )
206 .context("signal postgres to shut down")?;
207 proc.wait()
208 .await
209 .context("wait for postgres to shut down")
210 .map(|_| ())
211 }
@@ -68,6 +70,8 @@ struct Spec { | |||
encryption_secret: EncryptionSecret, | |||
#[serde_as(as = "serde_with::base64::Base64")] | |||
source_connstring_ciphertext_base64: Vec<u8>, | |||
#[serde_as(as = "Option<serde_with::base64::Base64>")] | |||
restore_connstring_ciphertext_base64: Option<Vec<u8>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why use restore
in the name. It's leaking an implenetation detail. I'd choose destination
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here, but let's solve this with a comment? Since it's a part of the spec, that is already matched in the control plane :(
let pg_port = || { | ||
args.pg_port.unwrap_or_else(|| { | ||
info!("pg_port not specified, using default 5432"); | ||
5432 | ||
}) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can specify default values in the Args
struct definition:
63 #[clap(long, default_value_t = 5432)]
64 pg_port: u16, // port to run postgres on, 5432 is default
}; | ||
|
||
// unused if run_postgres is false, but needed for shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. You still dump into dumpdir
in that case.
@@ -52,6 +52,8 @@ struct Args { | |||
s3_prefix: Option<s3_uri::S3Uri>, | |||
#[clap(long)] | |||
source_connection_string: Option<String>, | |||
#[clap(long)] | |||
restore_connection_string: Option<String>, // will not run postgres if specified, will do pg_restore to this connection string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's time to refactor things a bit to reflect the fact that there are two operation modes:
- Restore to local postgres and upload layers to S3.
- Restore to a running postgres instance.
It's getting hard to read this since it's just one code path where each block is enabled/disabled based on the initial argument set.
This can be done with separate subcommands (see control_plane/storcon_cli/src/main.rs
):
- Each subcommand has its own set of arguments.
- Each subcommand has it's separate handler. You can share utility functions between the handlers.
String::from_utf8(plaintext.into_inner()).context("parse connection string as utf8") | ||
} | ||
|
||
struct PostgresProcess { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can be in a separate module(file).
555f4c8
to
12a061c
Compare
@@ -326,7 +434,7 @@ pub(crate) async fn main() -> anyhow::Result<()> { | |||
.arg("--no-sync") | |||
// POSITIONAL args | |||
// source db (db name included in connection string) | |||
.arg(&source_connection_string) | |||
.arg(&source_connstring) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that failed pgdump doesn't end the process and restore happens later
14:27 $ kubectl logs job/importpgdata-br-billowing-darkness-14681524
Defaulted container "importer" out of: importer, setup-volume (init)
{"timestamp":"2025-02-03T12:24:18.289576Z","level":"INFO","fields":{"message":"starting"}}
{"timestamp":"2025-02-03T12:24:18.316861Z","level":"INFO","fields":{"message":"restore_connection_string specified, not runnin
g postgres process"}}
{"timestamp":"2025-02-03T12:24:18.317110Z","level":"INFO","fields":{"message":"dump into the working directory"}}
{"timestamp":"2025-02-03T12:24:18.317569Z","level":"INFO","fields":{"message":"spawned pg_dump","pid":"20"}}
{"timestamp":"2025-02-03T12:24:35.607724Z","level":"INFO","fields":{"message":"pg_dump: error: connection to server at \"ep-we
athered-rain-334581-pooler.us-east-2.aws.neon.tech\" (3.143.47.40), port 5432 failed: ERROR: database \"neondb1\" does not ex
ist","fd":"stderr"},"span":{"name":"pg_dump"},"spans":[{"name":"pg_dump"}]}
{"timestamp":"2025-02-03T12:24:35.610660Z","level":"INFO","fields":{"message":"pg_dump exited","status":"ExitStatus(unix_wait_
status(256))"}}
{"timestamp":"2025-02-03T12:24:35.610796Z","level":"WARN","fields":{"message":"pg_dump failed, restore will likely fail as wel
l","status":"exit status: 1"}}
{"timestamp":"2025-02-03T12:24:35.612199Z","level":"INFO","fields":{"message":"spawned pg_restore","pid":"21"}}
{"timestamp":"2025-02-03T12:24:35.620809Z","level":"INFO","fields":{"message":"pg_restore: error: could not open input file \"
/instance_store/dumpdir/toc.dat\": No such file or directory","fd":"stderr"},"span":{"name":"pg_restore"},"spans":[{"name":"pg
_restore"}]}
{"timestamp":"2025-02-03T12:24:35.621680Z","level":"INFO","fields":{"message":"pg_restore exited","status":"ExitStatus(unix_wa
it_status(256))"}}
{"timestamp":"2025-02-03T12:24:35.621720Z","level":"WARN","fields":{"message":"pg_restore failed, restore will likely fail as
well","status":"exit status: 1"}}
## Problem ## Summary of changes
Within https://github.com/neondatabase/cloud/issues/22089 we decided that would be nice to start with import that runs dump-restore into a running compute (more on this here) We could do it by writing another tool or by extending existing
fast_import.rs
, we chose the latter.In this PR, I have added optional
restore_connection_string
as a cli arg and as a part of the json spec. If specified, the script will not run postgres and will just perform restore into provided connection string.TODO:
fn main
a little, take out too verbose stuff to some functionsallow streaming from dump stdout to restore stdinwill do in a separate PRaddress fast import: restore to neondb (not postgres) database #10251 (review)will do in a separate PRtest with custom dbnamewill do in a separate PRtest with s3 + pageserver + fast import binaryfast import: full e2e with s3 and pageserver #10487fast import: basic python test #10271 (comment)will do in a separate PRneondatabase/cloud#22775