Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: implement ccm integration #1203

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

dkropachev
Copy link
Collaborator

CCM is cli tool to ease scylla/cassandra/dse cluster management.
Using it in integration suit will help to expand test coverage.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@dkropachev dkropachev requested a review from wprzytula February 6, 2025 04:52
@dkropachev dkropachev force-pushed the dk/tests-add-ccm-support branch 2 times, most recently from 386ad62 to a254460 Compare February 6, 2025 04:56
Copy link

github-actions bot commented Feb 6, 2025

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: f858370

@dkropachev dkropachev force-pushed the dk/tests-add-ccm-support branch 5 times, most recently from 281258a to 2c4c00b Compare February 6, 2025 15:09
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first round of the review.
There are for sure more issues, but I'm unable to grasp them until I gain the high-level of the approach, which I failed to achieve so far due to no comments nor commit messages nor the cover letter.

scylla/Cargo.toml Show resolved Hide resolved
scylla/Cargo.toml Show resolved Hide resolved
scylla/Cargo.toml Show resolved Hide resolved
scylla/tests/ccm-integration/logged_cli.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/logged_cli.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/test_example.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/logged_cli.rs Outdated Show resolved Hide resolved
stream: T,
writer: Arc<Mutex<File>>,
prefix: String,
mut buffer: Option<&mut Vec<u8>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mut buffer: Option<&mut Vec<u8>>,
buffer: Option<&mut Vec<u8>>,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not work:

error[E0596]: cannot borrow `buffer.0` as mutable, as `buffer` is not declared as mutable
   --> scylla/tests/ccm-integration/logged_cli.rs:249:25
    |
249 |             if let Some(ref mut buffer) = buffer {
    |                         ^^^^^^^^^^^^^^ cannot borrow as mutable
    |
help: consider changing this to be mutable
    |
237 |         mut buffer: Option<&mut Vec<u8>>,
    |         +++

If i change ref mut to ref it fails with:

error[E0596]: cannot borrow `**buffer` as mutable, as it is behind a `&` reference
   --> scylla/tests/ccm-integration/logged_cli.rs:250:17
    |
250 |                 buffer.extend_from_slice(line.as_bytes());
    |                 ^^^^^^ `buffer` is a `&` reference, so the data it refers to cannot be borrowed as mutable
    |
help: consider changing this to be a mutable reference
    |
249 |             if let Some(ref mut buffer) = buffer {
    |                             +++

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try experimenting with matching on buffer.as_ref() or buffer.as_mut() instead of matching on plain buffer. Might solve the issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_mut requires buffer to be mut, as_ref goes wrong way, but match buffer worked well, take a look

scylla/tests/ccm-integration/logged_cli.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/logged_cli.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/cluster.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/cluster.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/cluster.rs Show resolved Hide resolved
scylla/tests/ccm-integration/cluster.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/cluster.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/cluster.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Feb 7, 2025
Cleanup unused dependancies
@dkropachev dkropachev force-pushed the dk/tests-add-ccm-support branch from f14fa11 to b900f55 Compare February 8, 2025 03:02
@github-actions github-actions bot removed the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Feb 8, 2025
@dkropachev dkropachev force-pushed the dk/tests-add-ccm-support branch from b900f55 to 0c9ff2a Compare February 8, 2025 03:10
@dkropachev dkropachev requested a review from wprzytula February 8, 2025 03:10
@dkropachev dkropachev force-pushed the dk/tests-add-ccm-support branch 3 times, most recently from 13d7c5e to d97469d Compare February 8, 2025 21:27
Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By no means a thorough review, just posting a few things that I noticed during a brief read.

Few other things:

  • The utils for tests should be put in some folder inside ccm-integration, instead of being on the same level as the test files.
  • Before merging, we will need to update Makefile and CONTRIBUTING.md.

Comment on lines +137 to +149
pub(crate) enum NodeStartOption {
// Don't wait node to start
NoWait,
// Wait till other nodes recognize started node
WaitOtherNotice,
// Wait till started node report that client port is opened and operational
WaitForBinaryProto,
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was that it is nice that you translated CCM options to enum, because Rust Driver contributor shouldn't have to know anything about CCM, but now I see that it is not that obvious. Each variant corresponds to a command line option, and Node::start accepts Option<&[NodeStartOption]>.
So this requires the user of those utilities to understand how CCM handles repeated / conflicting command line arguments. What happends if I pass Some(&[NodeStartOption::NoWait, NodeStartOption::WaitOtherNotice])? It is not obvious to me after reading ccm's help and briefly even source code.
I think this should be a struct, with fields wait_other_notice: bool and wait_for_binary_proto: bool. This way the user specifices what they want to wait for, and only the ccm wrapper needs to know how to translate this to command line options.

Comment on lines +146 to +156
pub(crate) enum NodeStopOption {
// Don't wait node to properly stop
NoWait,
// Force-terminate node with `kill -9`
Kill,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this would be better expressed as a struct with boolean fields.

Comment on lines +11 to +18
lazy_static! {
static ref CLUSTER_VERSION: String =
std::env::var("SCYLLA_TEST_CLUSTER").unwrap_or("release:6.2.2".to_string());
static ref TEST_KEEP_CLUSTER_ON_FAILURE: bool = !std::env::var("TEST_KEEP_CLUSTER_ON_FAILURE")
.unwrap_or("".to_string())
.parse::<bool>()
.unwrap_or(false);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files with tests should not interact with environment at all. Cluster version selection (and thus reading the env var) should be done by the ccm utils.

Preferably, all interaction with env variables should be done in one place in the code.

Comment on lines +44 to +59
async fn run_ccm_test<C, T, CFut, TFut>(cb: C, test_body: T)
where
C: FnOnce() -> CFut,
CFut: std::future::Future<Output = Arc<RwLock<Cluster>>>,
T: FnOnce(Arc<RwLock<Cluster>>) -> TFut,
TFut: std::future::Future<Output = Result<(), Error>>,
{
let cluster_arc = cb().await;
{
let res = test_body(cluster_arc.clone()).await;
if res.is_err() && *TEST_KEEP_CLUSTER_ON_FAILURE {
println!("Test failed, keep cluster alive, TEST_KEEP_CLUSTER_ON_FAILURE=true");
cluster_arc.write().await.set_keep_on_drop(true);
}
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also something that should not be present in the test code, but rather in a centralized location.

Comment on lines +61 to +67
async fn get_session(cluster: &Cluster) -> Session {
let endpoints = cluster.nodes.get_contact_endpoints().await;
SessionBuilder::new()
.known_nodes(endpoints)
.build()
.await
.unwrap()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues with this:

  • This should not be in test file, but rather a method on Cluster
  • This should return a SessionBuilder so that a test can customize the session further. See
    pub(crate) fn create_new_session_builder() -> GenericSessionBuilder<impl SessionBuilderKind> {
    let session_builder = {
    #[cfg(not(scylla_cloud_tests))]
    {
    use scylla::client::session_builder::SessionBuilder;
    let uri = std::env::var("SCYLLA_URI").unwrap_or_else(|_| "127.0.0.1:9042".to_string());
    SessionBuilder::new().known_node(uri)
    }
    #[cfg(scylla_cloud_tests)]
    {
    use scylla::client::session_builder::{CloudMode, CloudSessionBuilder};
    use std::path::Path;
    std::env::var("CLOUD_CONFIG_PATH")
    .map(|config_path| CloudSessionBuilder::new(Path::new(&config_path)))
    .expect("Failed to initialize CloudSessionBuilder")
    .expect("CLOUD_CONFIG_PATH environment variable is missing")
    }
    };
    // The reason why we enable so long waiting for TracingInfo is... Cassandra. (Yes, again.)
    // In Cassandra Java Driver, the wait time for tracing info is 10 seconds, so here we do the same.
    // However, as Scylla usually gets TracingInfo ready really fast (our default interval is hence 3ms),
    // we stick to a not-so-much-terribly-long interval here.
    session_builder
    .tracing_info_fetch_attempts(NonZeroU32::new(200).unwrap())
    .tracing_info_fetch_interval(Duration::from_millis(50))
    }

Comment on lines +77 to +82
version: "".to_string(),
ip_prefix: NetPrefix::empty(),
root_dir: "/tmp/ccm".to_string(),
nodes: Vec::new(),
smp: DEFAULT_SMP,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in /tmp/ccm will prevent running multiple instances of tests simultaneously.
Either put in in some folder in the repo (probably the better option), or (my less preferred option) use some lib like https://docs.rs/mktemp/latest/mktemp/ to make a unique folder in tmp.

Comment on lines +19 to +21
// A simple abstraction to run commands, log command, exit code its stderr, stdout to a file
// and optionally to own stderr/stdout
// It should allow to run multiple commands in parallel
pub(crate) struct LoggedCmd {
file: Arc<Mutex<File>>,
run_id: AtomicI32,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment explains what does this struct do, but not why.
Why would we need to log ccm output to a file? In other words why is it not enough to log it to normal output, like other stuff in tests do? Note that if you write something to stdoud/stderr it will be properly caught, and printed in the test fails.
To demonstrate, I wrote a tiny integration test:

#[tokio::test]
async fn test_output() {
    setup_tracing();
    debug!("Some debug output");
    warn!("Some warn output");
    println!("Some stdout output");
    eprintln!("Some stderr output");
    panic!("To see if the test fails");
}

If I run it with cargo test test_output I get:
image

And if I enable tracing with RUST_LOG=trace cargo test test_output I get the tracing output too:
image

Is it for some reason not enough to just normally log ccm commands?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When test is complex it is very handy to be able to see a flow of ccm commands it executed.
Even if we go with dumping everything into stderr, stdout, we will still need this wrapper.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the complex test fails there are many things that are handy to see, I don't see a reason to single-out CCM to be dumped to the file.

The wrapper can still exists to dump CCM commands to the output.
I'd suggest using trace/debug for the actual output, and info for the command, env, and return code.

Comment on lines 388 to 427
async fn sniff_ipv4_prefix() -> Result<NetPrefix, Error> {
let mut used_ips: HashSet<IpAddr> = HashSet::new();
let file = File::open("/proc/net/tcp").await?;
let mut lines = BufReader::new(file).lines();
while let Some(line) = lines.next_line().await? {
let parts: Vec<&str> = line.split_whitespace().collect();
if let Some(ip_hex) = parts.get(1) {
let ip_port: Vec<&str> = ip_hex.split(':').collect();
if let Some(ip_hex) = ip_port.get(0) {
if let Some(ip) = u32::from_str_radix(ip_hex, 16).ok() {
used_ips.insert(
Ipv4Addr::new(ip as u8, (ip >> 8) as u8, (ip >> 16) as u8, 0).into(),
);
}
}
}
}

for a in 0..=255 {
for b in 0..=255 {
if a == 0 && b == 0 {
continue;
}
let ip_prefix: IpAddr = Ipv4Addr::new(127, a, b, 0).into();
if !used_ips.contains(&ip_prefix) {
return Ok(ip_prefix.into());
}
}
}
Err(Error::msg("All ip ranges are busy"))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its absurd that we have to do this, it should be ccm's job to find a free ips for the nodes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it does not do that and we have to dial with it.

Comment on lines +34 to +40
// CCM does not allow to have one active cluster within one config directory
// To have more than two active CCM cluster at the same time we isolate each cluster into separate
// config director, each config directory is created in `root_dir`.
// Example: root_dir = `/tmp/ccm`, config directory for cluster `test_cluster_1` is going be `/tmp/ccm/test_cluster_1`
// and cluster directory (that is created and managed by CCM) for this cluster is going to be `/tmp/ccm/test_cluster_1/test_cluster_1`
pub(crate) root_dir: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, ccm should be changed so that the commands can specify which cluster to operate on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but for now we have to dial with current behavior.

Comment on lines +587 to +603
fn get_ccm_env(&self) -> HashMap<String, String> {
let mut env: HashMap<String, String> = HashMap::new();
env.insert(
"SCYLLA_EXT_OPTS".to_string(),
format!("--smp={} --memory={}M", self.opts.smp, self.opts.memory),
);
env
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to imagine why would someone think it is a good idea to pass some options throught env instead normal arguments.....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)))

Copy link
Contributor

@muzarski muzarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should firstly focus on having the CI pass. Currently, the code does not even compile. Once we fix this, it will be much easier to test the code locally (which will be helpful in understanding the logic). I checked out on this PR locally, to play around with new API/run some tests but I was not able to do so, because it does not compile.

I suggest fixing the issues listed by: cargo clippy --verbose --all-features --all-targets -- -D warnings. Or if you want to not only do the checks, but run the tests as well, I suggest using make ci.

scylla/tests/ccm-integration/logged_cli.rs Outdated Show resolved Hide resolved
scylla/tests/ccm-integration/test_example.rs Outdated Show resolved Hide resolved
Implement simple abstraction that allows to run commands logging
command, return status, stderr and stdout to a file and optionally to
stderr/stdout
In order to programatically manipulate by scylla.yaml and cassandra.yaml
We need to have some struct that provides API for that.
It also needs to represent it in the way that ccm updateconf understand
CCM is cli tool to ease scylla/cassandra/dse cluster management.
Using it in integration suit will help to expand test coverage.
@dkropachev dkropachev force-pushed the dk/tests-add-ccm-support branch from d97469d to f858370 Compare February 10, 2025 01:17
@dkropachev
Copy link
Collaborator Author

I think that we should firstly focus on having the CI pass. Currently, the code does not even compile. Once we fix this, it will be much easier to test the code locally (which will be helpful in understanding the logic). I checked out on this PR locally, to play around with new API/run some tests but I was not able to do so, because it does not compile.

I suggest fixing the issues listed by: cargo clippy --verbose --all-features --all-targets -- -D warnings. Or if you want to not only do the checks, but run the tests as well, I suggest using make ci.

Thanks, done

Comment on lines 46 to 53
fn get_run_options(opts: RunOptions) -> (HashMap<String, String>, bool) {
match opts {
RunOptions::None => (HashMap::new(), false),
RunOptions::Env(new_env) => (new_env, false),
RunOptions::AllowFailure => (HashMap::new(), true),
RunOptions::AllowFailureWithEnv(new_env) => (new_env, true),
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ I case all four combinations are valid, I'd just represent the type as:

pub(crate) struct RunOptions {
    env: HashMap<String, String>,
    allow_failure: bool,
}

Comment on lines 55 to 62
async fn process_child_result(
status: io::Result<ExitStatus>,
allow_failure: bool,
run_id: i32,
stderr: Vec<u8>,
command_with_args: String,
writer: Arc<Mutex<File>>,
) -> Result<ExitStatus, Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 As a rule of thumb, only pass ownership to a function if it requires it; default should to to pass heap-allocated objects by reference (mutable reference if needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    async fn process_child_result(
        status: io::Result<ExitStatus>,
        allow_failure: bool,
        run_id: i32,
        stderr: &[u8],
        command_with_args: &str,
        writer: &Mutex<File>,
    ) -> Result<ExitStatus, Error> {

Comment on lines 97 to 103
let tmp = stderr.deref();
return Err(Error::msg(format!(
"Command `{}` failed: {}, stderr: \n{}",
command_with_args,
status,
std::str::from_utf8(tmp)?
)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tmp should be needed; deref is often called automagically. Try putting &stderr into from_utf8().

Comment on lines 145 to 152
let command_with_args = std::iter::once(command.as_ref().to_string_lossy().into_owned())
.chain(
args.clone()
.into_iter()
.map(|s| s.as_ref().to_string_lossy().into_owned()),
)
.collect::<Vec<String>>()
.join(" ");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 You allocate this String unconditionally, and its only use is in process_child_result upon error. I believe it should be built lazily, only in the error case, inside process_child_result. Is this possible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this would also make this function yet shorter, which is good considering its importance.

Comment on lines 224 to 243
match buffer {
Some(buffer) => {
while let Some(line) = lines.next_line().await.ok().flatten() {
let _ = writer
.lock()
.await
.write_all(format!("{} {}\n", prefix, line).as_bytes())
.await;
buffer.extend_from_slice(line.as_bytes());
}
}
None => {
while let Some(line) = lines.next_line().await.ok().flatten() {
let _ = writer
.lock()
.await
.write_all(format!("{} {}\n", prefix, line).as_bytes())
.await;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 The match over buffer should be inside the while let, not outside. This would deduplicate the while let.

Comment on lines +17 to +28
impl Default for NodeConfig {
fn default() -> NodeConfig {
let mut val = HashMap::new();
val.insert(
"endpoint_snitch".to_string(),
NodeConfig::String(
"org.apache.cassandra.locator.GossipingPropertyFileSnitch".to_owned(),
),
);
NodeConfig::Map(val)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw is there a reason for making this a dynamic type instead of a struct with fields corresponding to config values, besides the fact that it would probably be more work?

It's arguably less work, as we let serde_yaml do the struct populating and error handling job for us.

Comment on lines +36 to +40
let keys: Vec<&str> = path.split('.').collect();
let mut current = self;

for key in keys {
match current {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 You don't need to collect an iterator in order to iterate over it. Let's simply remove collect().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants