Skip to content

Commit 8efaaaf

Browse files
authored
Adding tools to manage ClickHouse during development (#126)
- Updates tool for downloading ClickHouse, now using prebuilt binaries from S3. - Adds the `omicron_common::dev::clickhouse::ClickHouseInstance` object. This is modeled after the existing `CockroachInstance` object, and is used to start and manage a ClickHouse server sub-process. It currently is quite constrained, only allowing users to specify the HTTP port the server listens on. This is for simplicity, and further options may be needed in the future. - Updates the Oximeter client tests. These now spawn their own `ClickHouseInstance` in each test, so they can be run in parallel. This also removes the feature flag used to hide the tests on non-linux platforms. - Updates the GitHub workflow to use the new tool to download ClickHouse - Adds the `ch-run` subcommand to the `omicron-dev` tool, and updates README.adoc with details about how to run
1 parent df99c1a commit 8efaaaf

File tree

9 files changed

+394
-108
lines changed

9 files changed

+394
-108
lines changed

.github/workflows/rust.yml

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -95,16 +95,21 @@ jobs:
9595
with:
9696
key: ${{ runner.os }}-cockroach-binary-v2
9797
path: "cockroachdb"
98-
- name: Install ClickHouse
99-
# Workflow environment variables are not propagated to the script by default
100-
run: sudo CI=$CI ./tools/ci_download_clickhouse
98+
- name: Configure GitHub cache for ClickHouse binaries
99+
id: cache-clickhouse
100+
101+
uses: actions/cache@26968a09c0ea4f3e233fdddbafd1166051a095f6
102+
with:
103+
key: ${{ runner.os }}-clickhouse-binary
104+
path: "clickhouse"
101105
- name: Build
102106
run: cargo +${{ matrix.toolchain }} build --locked --all-targets --verbose
107+
- name: Download ClickHouse
108+
if: steps.cache-clickhouse.outputs.cache-hit != 'true'
109+
run: ./tools/ci_download_clickhouse
103110
- name: Download CockroachDB binary
104111
if: steps.cache-cockroachdb.outputs.cache-hit != 'true'
105112
run: bash ./tools/ci_download_cockroachdb
106113
- name: Run tests
107-
# Put "./cockroachdb/bin" on the PATH for the test suite.
108-
run: PATH="$PATH:$PWD/cockroachdb/bin" cargo +${{ matrix.toolchain }} test --locked --verbose
109-
- name: Run ClickHouse client tests
110-
run: cargo test --locked --verbose --features clickhouse-tests --package oximeter -- --test-threads 1 db::client
114+
# Put "./cockroachdb/bin" and "./clickhouse" on the PATH for the test suite.
115+
run: PATH="$PATH:$PWD/cockroachdb/bin:$PWD/clickhouse" cargo +${{ matrix.toolchain }} test --locked --verbose

README.adoc

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -56,23 +56,26 @@ See TODO.adoc for more info.
5656
1. CockroachDB v20.2.5. The test suite expects to be able to start a single-node CockroachDB cluster using the `cockroach` executable on your PATH. On illumos, MacOS, and Linux, you should be able to use the `tools/ci_download_cockroachdb` script to fetch the official CockroachDB binary. It will be put into `./cockroachdb/bin/cockroach`. Alternatively, you can follow the https://www.cockroachlabs.com/docs/stable/install-cockroachdb.html[official CockroachDB installation instructions for your platform].
5757
With `cockroach` on your PATH, you can **build and run the whole test suite** with `cargo test`. The test suite runs cleanly and should remain clean.
5858

59-
2. ClickHouse >= v21.5.5. The test suite expects a running instance of the ClickHouse database server.
59+
2. ClickHouse >= v21.7.1. The test suite expects a running instance of the ClickHouse database server.
6060
(It currently _does not_ start the server automatically.) The script `./tools/ci_download_clickhouse`
61-
can be used to download and install the server on Linux platforms only. It's not currently possible
62-
to validate the downloaded binaries automatically on other platforms, so it must be done manually.
63-
Instructions can be found https://clickhouse.tech/docs/en/getting-started/install/#from-binaries-non-linux[here].
61+
can be used to download a pre-built binary for illumos, Linux, or macOS platforms. You may also install
62+
ClickHouse manually; instructions can be found https://clickhouse.tech/docs/en/getting-started/install[here].
6463

6564
You can **format the code** using `cargo fmt`. Make sure to run this before pushing changes. The CI checks that the code is correctly formatted.
6665

6766
You can **run the https://github.com/rust-lang/rust-clippy[Clippy linter]** using `cargo clippy \-- -D warnings -A clippy::style`. Make sure to run this before pushing changes. The CI checks that the code is clippy-clean.
6867

69-
To **run Omicron** you need to run three programs:
68+
To **run Omicron** you need to run four programs:
7069

7170
* a CockroachDB cluster. For development, you can use the `omicron-dev` tool in this repository to start a single-node CockroachDB cluster **that will delete the database when you shut it down.**
71+
* a ClickHouse server. You may use the `omicron-dev` tool for this as well, see below, and as with CockroachDB,
72+
the database files will be deleted when you stop the program.
7273
* `nexus`: the guts of the control plane
7374
* `sled-agent-sim`: a simulator for the component that manages a single sled
7475

75-
The easiest way to start a CockroachDB cluster is to use the built in `omicron-dev` tool. This tool assumes that the `cockroach` executable on your PATH comes from CockroachDB v20.2.5.
76+
The easiest way to start the required databases is to use the built in `omicron-dev` tool.
77+
This tool assumes that the `cockroach` and `clickhouse` executables are on your PATH,
78+
and match the versions above.
7679

7780
1. Start CockroachDB using `omicron-dev db-run`:
7881
+
@@ -129,7 +132,18 @@ omicron-dev: populated database
129132
+
130133
Note that as the output indicates, this cluster will be available to anybody that can reach 127.0.0.1.
131134

132-
2. `nexus` requires a configuration file to run. You can use `omicron-nexus/examples/config.toml` to start with. Build and run it like this:
135+
2. Start the ClickHouse database server:
136+
+
137+
[source,text]
138+
----
139+
$ cargo run --bin omicron-dev -- ch-run
140+
Finished dev [unoptimized + debuginfo] target(s) in 0.47s
141+
Running `target/debug/omicron-dev ch-run`
142+
omicron-dev: running ClickHouse (PID: 2463), full command is "clickhouse server --log-file /var/folders/67/2tlym22x1r3d2kwbh84j298w0000gn/T/.tmpJ5nhot/clickhouse-server.log --errorlog-file /var/folders/67/2tlym22x1r3d2kwbh84j298w0000gn/T/.tmpJ5nhot/clickhouse-server.errlog -- --http_port 8123 --path /var/folders/67/2tlym22x1r3d2kwbh84j298w0000gn/T/.tmpJ5nhot"
143+
omicron-dev: using /var/folders/67/2tlym22x1r3d2kwbh84j298w0000gn/T/.tmpJ5nhot for ClickHouse data storage
144+
----
145+
146+
3. `nexus` requires a configuration file to run. You can use `omicron-nexus/examples/config.toml` to start with. Build and run it like this:
133147
+
134148
[source,text]
135149
----
@@ -138,11 +152,11 @@ $ cargo run --bin=nexus -- omicron-nexus/examples/config.toml
138152
listening: http://127.0.0.1:12220
139153
----
140154

141-
3. `sled-agent-sim` only accepts configuration on the command line. Run it with a uuid identifying itself (this would be a uuid for the sled it's managing), an IP:port for itself, and the IP:port of `nexus`'s _internal_ interface. Using default config, this might look like this:
155+
4. `sled-agent` only accepts configuration on the command line. Run it with a uuid identifying itself (this would be a uuid for the sled it's managing), an IP:port for itself, and the IP:port of `nexus`'s _internal_ interface. Using default config, this might look like this:
142156
+
143157
[source,text]
144158
----
145-
$ cargo run --bin=sled-agent-sim -- $(uuidgen) 127.0.0.1:12345 127.0.0.1:12221
159+
$ cargo run --bin=sled-agent -- $(uuidgen) 127.0.0.1:12345 127.0.0.1:12221
146160
...
147161
Jun 02 12:21:50.989 INFO listening, local_addr: 127.0.0.1:12345, component: dropshot
148162
----

omicron-common/src/bin/omicron-dev.rs

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ async fn main() -> Result<(), anyhow::Error> {
2222
OmicronDb::DbRun { ref args } => cmd_db_run(args).await,
2323
OmicronDb::DbPopulate { ref args } => cmd_db_populate(args).await,
2424
OmicronDb::DbWipe { ref args } => cmd_db_wipe(args).await,
25+
OmicronDb::ChRun { ref args } => cmd_clickhouse_run(args).await,
2526
};
2627
if let Err(error) = result {
2728
fatal(CmdError::Failure(format!("{:#}", error)));
@@ -50,6 +51,12 @@ enum OmicronDb {
5051
#[structopt(flatten)]
5152
args: DbWipeArgs,
5253
},
54+
55+
/// Run a ClickHouse database server for development
56+
ChRun {
57+
#[structopt(flatten)]
58+
args: ChRunArgs,
59+
},
5360
}
5461

5562
#[derive(Debug, StructOpt)]
@@ -227,3 +234,54 @@ async fn cmd_db_wipe(args: &DbWipeArgs) -> Result<(), anyhow::Error> {
227234
client.cleanup().await.expect("connection failed");
228235
Ok(())
229236
}
237+
238+
#[derive(Debug, StructOpt)]
239+
struct ChRunArgs {
240+
/// The HTTP port on which the server will listen
241+
#[structopt(short, long, default_value = "8123")]
242+
port: u16,
243+
}
244+
245+
async fn cmd_clickhouse_run(args: &ChRunArgs) -> Result<(), anyhow::Error> {
246+
// Start a stream listening for SIGINT
247+
let signals = Signals::new(&[SIGINT]).expect("failed to wait for SIGINT");
248+
let mut signal_stream = signals.fuse();
249+
250+
// Start the database server process, possibly on a specific port
251+
let mut db_instance =
252+
dev::clickhouse::ClickHouseInstance::new(args.port).await?;
253+
println!(
254+
"omicron-dev: running ClickHouse (PID: {}), full command is \"clickhouse {}\"",
255+
db_instance.pid().expect("Failed to get process PID, it may not have started"),
256+
db_instance.cmdline().join(" ")
257+
);
258+
println!(
259+
"omicron-dev: using {} for ClickHouse data storage",
260+
db_instance.data_path().display()
261+
);
262+
263+
// Wait for the DB to exit itself (an error), or for SIGINT
264+
tokio::select! {
265+
_ = db_instance.wait_for_shutdown() => {
266+
db_instance.cleanup().await.context("clean up after shutdown")?;
267+
bail!("omicron-dev: ClickHouse shutdown unexpectedly");
268+
}
269+
caught_signal = signal_stream.next() => {
270+
assert_eq!(caught_signal.unwrap(), SIGINT);
271+
272+
// As above, we don't need to explicitly kill the DB process, since
273+
// the shell will have delivered the signal to the whole process group.
274+
eprintln!(
275+
"omicron-dev: caught signal, shutting down and removing \
276+
temporary directory"
277+
);
278+
279+
// Remove the data directory.
280+
db_instance
281+
.wait_for_shutdown()
282+
.await
283+
.context("clean up after SIGINT shutdown")?;
284+
}
285+
}
286+
Ok(())
287+
}

omicron-common/src/dev/clickhouse.rs

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
//! Tools for managing ClickHouse during development
2+
3+
use std::path::{Path, PathBuf};
4+
use std::process::Stdio;
5+
use std::time::Duration;
6+
7+
use anyhow::Context;
8+
use tempfile::TempDir;
9+
10+
use crate::dev::poll;
11+
12+
/// A `ClickHouseInstance` is used to start and manage a ClickHouse server process.
13+
#[derive(Debug)]
14+
pub struct ClickHouseInstance {
15+
// Directory in which all data, logs, etc are stored.
16+
data_dir: Option<TempDir>,
17+
data_path: PathBuf,
18+
// The HTTP port the server is listening on
19+
port: u16,
20+
// Full list of command-line arguments
21+
args: Vec<String>,
22+
// Subprocess handle
23+
child: Option<tokio::process::Child>,
24+
}
25+
26+
impl ClickHouseInstance {
27+
/// Start a new ClickHouse server
28+
pub async fn new(port: u16) -> Result<Self, anyhow::Error> {
29+
let data_dir = TempDir::new()?;
30+
let log_path = data_dir.path().join("clickhouse-server.log");
31+
let err_log_path = data_dir.path().join("clickhouse-server.errlog");
32+
let args = vec![
33+
"server".to_string(),
34+
"--log-file".to_string(),
35+
log_path.display().to_string(),
36+
"--errorlog-file".to_string(),
37+
err_log_path.display().to_string(),
38+
"--".to_string(),
39+
"--http_port".to_string(),
40+
format!("{}", port),
41+
"--path".to_string(),
42+
data_dir.path().display().to_string(),
43+
];
44+
45+
let child = tokio::process::Command::new("clickhouse")
46+
.args(&args)
47+
// ClickHouse internall tees its logs to a file, so we throw away std{in,out,err}
48+
.stdin(Stdio::null())
49+
.stdout(Stdio::null())
50+
.stderr(Stdio::null())
51+
// By default ClickHouse forks a child if it's been explicitly requested via the
52+
// following environment variable, _or_ if it's not attached to a TTY. Avoid this
53+
// behavior, so that we can correctly deliver SIGINT. The "watchdog" masks SIGINT,
54+
// meaning we'd have to deliver that to the _child_, which is more complicated.
55+
.env("CLICKHOUSE_WATCHDOG_ENABLE", "0")
56+
.spawn()?;
57+
58+
// Wait for the "status" file to become available, at least with a newline.
59+
let data_path = data_dir.path().to_path_buf();
60+
let status_file = data_path.join("status");
61+
poll::wait_for_condition(
62+
|| async {
63+
let contents =
64+
match tokio::fs::read_to_string(&status_file).await {
65+
Ok(contents) => contents,
66+
Err(e) => match e.kind() {
67+
std::io::ErrorKind::NotFound => {
68+
return Err(poll::CondCheckError::NotYet)
69+
}
70+
_ => return Err(poll::CondCheckError::from(e)),
71+
},
72+
};
73+
if !contents.is_empty() && contents.contains('\n') {
74+
Ok(())
75+
} else {
76+
Err(poll::CondCheckError::NotYet)
77+
}
78+
},
79+
&Duration::from_millis(500),
80+
&Duration::from_secs(30),
81+
)
82+
.await?;
83+
84+
Ok(Self {
85+
data_dir: Some(data_dir),
86+
data_path,
87+
port,
88+
args,
89+
child: Some(child),
90+
})
91+
}
92+
93+
/// Wait for the ClickHouse server process to shutdown, after it's been killed.
94+
pub async fn wait_for_shutdown(&mut self) -> Result<(), anyhow::Error> {
95+
if let Some(mut child) = self.child.take() {
96+
child.wait().await?;
97+
}
98+
self.cleanup().await
99+
}
100+
101+
/// Kill the ClickHouse server process and cleanup the data directory.
102+
pub async fn cleanup(&mut self) -> Result<(), anyhow::Error> {
103+
if let Some(mut child) = self.child.take() {
104+
child.start_kill().context("Sending SIGKILL to child")?;
105+
child.wait().await.context("waiting for child")?;
106+
}
107+
if let Some(dir) = self.data_dir.take() {
108+
dir.close().context("Cleaning up temporary directory")?;
109+
110+
// ClickHouse doesn't fully respect the `--path` flag, and still seems
111+
// to put the `preprocessed_configs` directory in $CWD.
112+
let _ = std::fs::remove_dir_all("./preprocessed_configs");
113+
}
114+
Ok(())
115+
}
116+
117+
/// Return the full path to the directory used for the server's data.
118+
pub fn data_path(&self) -> &Path {
119+
&self.data_path
120+
}
121+
122+
/// Return the command-line used to start the ClickHouse server process
123+
pub fn cmdline(&self) -> &Vec<String> {
124+
&self.args
125+
}
126+
127+
/// Return the child PID, if any
128+
pub fn pid(&self) -> Option<u32> {
129+
self.child.as_ref().and_then(|child| child.id())
130+
}
131+
132+
/// Return the HTTP port the server is listening on.
133+
pub fn port(&self) -> u16 {
134+
self.port
135+
}
136+
}
137+
138+
impl Drop for ClickHouseInstance {
139+
fn drop(&mut self) {
140+
if self.child.is_some() || self.data_dir.is_some() {
141+
eprintln!(
142+
"WARN: dropped CockroachInstance without cleaning it up first \
143+
(there may still be a child process running and a \
144+
temporary directory leaked)"
145+
);
146+
if let Some(child) = self.child.as_mut() {
147+
let _ = child.start_kill();
148+
}
149+
if let Some(dir) = self.data_dir.take() {
150+
let _ = dir.close();
151+
}
152+
}
153+
}
154+
}

omicron-common/src/dev/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
*/
99
#![cfg_attr(not(test), allow(dead_code))]
1010

11+
pub mod clickhouse;
1112
pub mod db;
1213
pub mod poll;
1314
pub mod test_cmds;

omicron-common/tests/output/cmd-omicron-dev-noargs-stderr

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ FLAGS:
99
-V, --version Prints version information
1010

1111
SUBCOMMANDS:
12+
ch-run Run a ClickHouse database server for development
1213
db-populate Populate an existing CockroachDB cluster with the Omicron schema
1314
db-run Start a CockroachDB cluster for development
1415
db-wipe Wipe the Omicron schema (and all data) from an existing CockroachDB cluster

oximeter/oximeter/Cargo.toml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,3 @@ uuid = { version = "0.8.2", features = [ "v4", "serde" ] }
2525

2626
[dev-dependencies]
2727
trybuild = "1.0.42"
28-
29-
[features]
30-
clickhouse-tests = []

0 commit comments

Comments
 (0)