Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast import: allow restore to provided connection string #10407

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

NanoBjorn
Copy link
Contributor

@NanoBjorn NanoBjorn commented Jan 15, 2025

Within https://github.com/neondatabase/cloud/issues/22089 we decided that would be nice to start with import that runs dump-restore into a running compute (more on this here) We could do it by writing another tool or by extending existing fast_import.rs, we chose the latter.

In this PR, I have added optional restore_connection_string as a cli arg and as a part of the json spec. If specified, the script will not run postgres and will just perform restore into provided connection string.

TODO:

neondatabase/cloud#22775

@NanoBjorn NanoBjorn force-pushed the cloud-22775-restore-to-connstring branch from 70d43af to 4a8ff6d Compare January 15, 2025 14:08
Copy link

github-actions bot commented Jan 15, 2025

7436 tests run: 7085 passed, 0 failed, 351 skipped (full report)


Flaky tests (3)

Postgres 17

Postgres 14

Code coverage* (full report)

  • functions: 33.2% (8525 of 25660 functions)
  • lines: 48.9% (71581 of 146310 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
cd757ff at 2025-02-04T10:18:59.193Z :recycle:

@NanoBjorn NanoBjorn force-pushed the cloud-22775-restore-to-connstring branch from 4a8ff6d to 0e40544 Compare January 16, 2025 19:08
@NanoBjorn NanoBjorn force-pushed the cloud-22775-restore-to-connstring branch 2 times, most recently from 32ed28a to 01604b3 Compare January 20, 2025 19:21
@duskpoet
Copy link
Member

why not put the restore connection string into the spec? Shouldn't it be encoded the same way as the source string?

@NanoBjorn NanoBjorn force-pushed the cloud-22775-restore-to-connstring branch from 01604b3 to 6cd4b9c Compare January 21, 2025 12:53
@VladLazar VladLazar self-requested a review January 21, 2025 13:25
@NanoBjorn NanoBjorn marked this pull request as ready for review January 21, 2025 13:25
@NanoBjorn NanoBjorn requested a review from a team as a code owner January 21, 2025 13:26
@NanoBjorn NanoBjorn requested review from knizhnik and lubennikovaav and removed request for a team January 21, 2025 13:26
@NanoBjorn
Copy link
Contributor Author

NanoBjorn commented Jan 21, 2025

why not put the restore connection string into the spec? Shouldn't it be encoded the same way as the source string?

I have done it both ways as with source connection strings, there are tests on both paths as well

Base automatically changed from 22037-basic-fast-import-e2e to main January 21, 2025 16:57
@@ -347,6 +350,104 @@ def test_fast_import_binary(
assert res[0][0] == 10


def test_fast_import_restore_to_connstring(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the same test as test_fast_import above. Can we parametrise the previous test instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are similar only in first few lines, then we either run postgres ourselves and restore into it (test_fast_import_restore_to_connstring) or fast import runs its own postgres locally (test_fast_import_binary). These scenarious are different, I am not sure it is worth combining them into one test, I am not even sure than combined test will take less lines than both of these 😄

Comment on lines +202 to +212
{
nix::sys::signal::kill(
Pid::from_raw(i32::try_from(proc.id().unwrap()).expect("convert child pid to i32")),
nix::sys::signal::SIGTERM,
)
.context("signal postgres to shut down")?;
proc.wait()
.await
.context("wait for postgres to shut down")?;
}
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra scope? You could instead do. Also, is the return code not interesting?

  199     async fn shutdown(&mut self) -> Result<(), anyhow::Error> {
  200         let proc: &mut tokio::process::Child = self.postgres_proc.as_mut().unwrap();
  201         info!("shutdown postgres");
  202         nix::sys::signal::kill(
  203             Pid::from_raw(i32::try_from(proc.id().unwrap()).expect("convert child pid to i32")),
  204             nix::sys::signal::SIGTERM,
  205         )
  206         .context("signal postgres to shut down")?;
  207         proc.wait()
  208             .await
  209             .context("wait for postgres to shut down")
  210             .map(|_| ())
  211     }

@@ -68,6 +70,8 @@ struct Spec {
encryption_secret: EncryptionSecret,
#[serde_as(as = "serde_with::base64::Base64")]
source_connstring_ciphertext_base64: Vec<u8>,
#[serde_as(as = "Option<serde_with::base64::Base64>")]
restore_connstring_ciphertext_base64: Option<Vec<u8>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why use restore in the name. It's leaking an implenetation detail. I'd choose destination instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here, but let's solve this with a comment? Since it's a part of the spec, that is already matched in the control plane :(

Comment on lines +307 to +312
let pg_port = || {
args.pg_port.unwrap_or_else(|| {
info!("pg_port not specified, using default 5432");
5432
})
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can specify default values in the Args struct definition:

63     #[clap(long, default_value_t = 5432)]
64     pg_port: u16, // port to run postgres on, 5432 is default

};

// unused if run_postgres is false, but needed for shutdown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. You still dump into dumpdir in that case.

@@ -52,6 +52,8 @@ struct Args {
s3_prefix: Option<s3_uri::S3Uri>,
#[clap(long)]
source_connection_string: Option<String>,
#[clap(long)]
restore_connection_string: Option<String>, // will not run postgres if specified, will do pg_restore to this connection string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's time to refactor things a bit to reflect the fact that there are two operation modes:

  1. Restore to local postgres and upload layers to S3.
  2. Restore to a running postgres instance.

It's getting hard to read this since it's just one code path where each block is enabled/disabled based on the initial argument set.

This can be done with separate subcommands (see control_plane/storcon_cli/src/main.rs):

  • Each subcommand has its own set of arguments.
  • Each subcommand has it's separate handler. You can share utility functions between the handlers.

String::from_utf8(plaintext.into_inner()).context("parse connection string as utf8")
}

struct PostgresProcess {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This can be in a separate module(file).

@NanoBjorn NanoBjorn force-pushed the cloud-22775-restore-to-connstring branch from 555f4c8 to 12a061c Compare January 31, 2025 18:43
@@ -326,7 +434,7 @@ pub(crate) async fn main() -> anyhow::Result<()> {
.arg("--no-sync")
// POSITIONAL args
// source db (db name included in connection string)
.arg(&source_connection_string)
.arg(&source_connstring)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that failed pgdump doesn't end the process and restore happens later

14:27 $ kubectl logs job/importpgdata-br-billowing-darkness-14681524
Defaulted container "importer" out of: importer, setup-volume (init)
{"timestamp":"2025-02-03T12:24:18.289576Z","level":"INFO","fields":{"message":"starting"}}
{"timestamp":"2025-02-03T12:24:18.316861Z","level":"INFO","fields":{"message":"restore_connection_string specified, not runnin
g postgres process"}}
{"timestamp":"2025-02-03T12:24:18.317110Z","level":"INFO","fields":{"message":"dump into the working directory"}}
{"timestamp":"2025-02-03T12:24:18.317569Z","level":"INFO","fields":{"message":"spawned pg_dump","pid":"20"}}
{"timestamp":"2025-02-03T12:24:35.607724Z","level":"INFO","fields":{"message":"pg_dump: error: connection to server at \"ep-we
athered-rain-334581-pooler.us-east-2.aws.neon.tech\" (3.143.47.40), port 5432 failed: ERROR:  database \"neondb1\" does not ex
ist","fd":"stderr"},"span":{"name":"pg_dump"},"spans":[{"name":"pg_dump"}]}
{"timestamp":"2025-02-03T12:24:35.610660Z","level":"INFO","fields":{"message":"pg_dump exited","status":"ExitStatus(unix_wait_
status(256))"}}
{"timestamp":"2025-02-03T12:24:35.610796Z","level":"WARN","fields":{"message":"pg_dump failed, restore will likely fail as wel
l","status":"exit status: 1"}}
{"timestamp":"2025-02-03T12:24:35.612199Z","level":"INFO","fields":{"message":"spawned pg_restore","pid":"21"}}
{"timestamp":"2025-02-03T12:24:35.620809Z","level":"INFO","fields":{"message":"pg_restore: error: could not open input file \"
/instance_store/dumpdir/toc.dat\": No such file or directory","fd":"stderr"},"span":{"name":"pg_restore"},"spans":[{"name":"pg
_restore"}]}
{"timestamp":"2025-02-03T12:24:35.621680Z","level":"INFO","fields":{"message":"pg_restore exited","status":"ExitStatus(unix_wa
it_status(256))"}}
{"timestamp":"2025-02-03T12:24:35.621720Z","level":"WARN","fields":{"message":"pg_restore failed, restore will likely fail as 
well","status":"exit status: 1"}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants