Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: black-box integration tests #5087

Closed

Conversation

0x009922
Copy link
Contributor

@0x009922 0x009922 commented Sep 19, 2024

Context

Very early draft, work in progress!

iroha_test_network has a few problems:

  • It doesn't test Iroha as a black box, but runs directly using internal implementation details.
  • Execution flow of peers is messy, hard to debug & control. (e.g. for cases of restarts). Logs from all peers are all over the place, very noisy.
  • Messy API: hard to fine tune the network & peers for specific cases; high coupling with defaults; thread::sleeps with pipeline time instead of providing precise lifecycle hooks
  • Non-optimal defaults: most of the tests don't need 4 peers with 4s of pipeline time. A single peer with close-to-zero block time works significantly faster.

Black boxing would mean to run Iroha through its CLI as a dedicated process, just as users would do.

Solution

Re-implement iroha_test_network:

  • Black-boxing: run irohad as a direct child process.
    • Create temp dir for every randomly generated peer and store there its state, configs, and logs. It could be observed (and maybe even replayed??) after tests run.
    • Configuration through raw TOML - feel your users!
    • Can work with any irohad target (e.g. debug, release)
  • Allocate ports automatically (fslock-based solution) - no more manual ports setting. Works intra-process (cargo test) and inter-process (cargo nextest Use Nextest #4987).
  • Higher-level, dynamic, flexible, async-first API, helpful for writing expressive tests.
  • Optimal defaults: a single peer with very short timings.
  • Faster!
  • Bonus: test execution can now be terminated immediately with SIGINT (Ctrl + C). It used to be suspended.

Other changes:

  • Turns out some integration tests were broken, but passing for other reasons. It was primarily a cause of a messy test network API.
  • Make irohad a closed binary; don't expose Iroha; remove samples from it. Closes [suggestion] Refactor Iroha CLI #4136
  • Minor changes in iroha_core and iroha_torii.
  • remove unstable_network tests: they rely on a direct violation of black boxing - the use of FreezeStatus to make peers faulty. To be re-implemented in some other way.

Further steps

Flaky tests

Migration Guide (optional)

TODO


Review notes (optional)

Due to Client still being blocking, there is some ugly code with spawn_blocking.

Checklist

  • I've read CONTRIBUTING.md.
  • (optional) I've written unit tests for the code changes.
  • All review comments have been resolved.
  • All CI checks pass.

Undraft:

  • Update all integration tests
  • Update remove benches and examples
  • Support running from CI (set irohad binary path via ENV?)
  • Stabilise flaky tests

@0x009922 0x009922 added Enhancement New feature or request Tests labels Sep 19, 2024
@0x009922 0x009922 self-assigned this Sep 19, 2024
@github-actions github-actions bot added the api-changes Changes in the API for client libraries label Sep 19, 2024
@Erigara Erigara assigned Erigara and unassigned Erigara Sep 19, 2024
crates/iroha/src/lib.rs Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
crates/iroha/tests/integration/asset.rs Show resolved Hide resolved
crates/iroha/tests/integration/asset.rs Show resolved Hide resolved
Comment on lines 30 to 39
tokio::spawn(async move {
while let Ok(e) = events.recv().await {
match e {
PeerLifecycleEvent::LogBlockCommitted { height } => {
println!("Last peer block committed: {height}")
}
_ => {}
}
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This task is not synchronized with the main one.
It's purpose is to only print messages?

Also logger is no longer available in integration tests, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only for debugging. It sometimes fails due to timeout being too short (which is convenient for local development). Need to adjust further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also logger is no longer available in integration tests, right?

test_logger() setup function is still available and will work as before, but I want to discourage its usage. Mainly because it requires initialisation before logs could be seen, which adds a layer of complexity to already complex tests.

If some advanced logging is needed (more advanced than println!), we can use log crate, for example. tracing is a bit too heavy imo.

crates/irohad/src/main.rs Outdated Show resolved Hide resolved
crates/iroha_test_network/src/lib.rs Outdated Show resolved Hide resolved
crates/iroha_test_network/src/lib.rs Outdated Show resolved Hide resolved
crates/iroha_test_network/src/lib.rs Outdated Show resolved Hide resolved
const PEER_KILL_TIMEOUT: Duration = Duration::from_secs(2);

// TODO: read from ENV?
const IROHA_BIN: &'static str = "/Users/qua/dev/iroha/target/release/irohad";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do smt like?

const IROHA_BIN: &str = concat!(env!("CARGO_MANIFEST_DIR"), "/targer/release/irohad");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that would be better, though still temporary. I plan to:

  • Add ability to switch between debug and release
  • Add a check before executing Iroha that the binary exists and give a recommendation to run cargo build --bin irohad.
    • Or maybe even run cargo build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with usage of which, i.e. irohad must be available in $PATH before running test network.

The easiest way is to run cargo install --path creates/irohad.

Comment on lines +234 to +235
let (network, _rt) = NetworkBuilder::new().start_blocking().unwrap();
let test_client = network.client();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose a network with 1 peer is created by default? I think that's ok. No need to test consensus every time

However, I hope that network.client() returns a random client when a multi-node network is started

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose a network with 1 peer is created by default? I think that's ok. No need to test consensus every time

Yes, it is.

However, I hope that network.client() returns a random client when a multi-node network is started

Currently not, but in light of your other comment of returning random peers by default, it will.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made network.client() and network.peer() both return random ones.

@0x009922
Copy link
Contributor Author

With hot-start (when all WASMs are pre-built) and using release Iroha, all integration tests now take less than 30 seconds to complete. The majority of them completes within 10 seconds. Time triggers take a lot and need a more thorough rewrite (which I am not going to do here).

Some tests are still flaky though (e.g. restarted_peer_should_have_the_same_asset_amount, network_stable_after_add_and_after_remove_peer, and longish multiple_blocks_created).

@@ -76,8 +76,7 @@ jobs:
path: ${{ env.DOCKER_COMPOSE_PATH }}
- name: Run tests, with coverage
run: |
cargo build --bin irohad
export IROHAD_EXEC=$(realpath ./target/debug/irohad)
mold --run cargo install --path ./crates/irohad
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up fixing the cache, this will likely break without which irohad || because cargo install returns an error if the binary has been installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, but it doesn't currently work anyway =(

I have yet to investigate (logs take 200mb) why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added which irohad check

cargo install works, by the way. But I got a vast amount of "multiple items" errors in rust std while it compiles wasms/ui tests, haven't figured it out yet.

@0x009922
Copy link
Contributor Author

0x009922 commented Oct 1, 2024

Investigation: CI seems to fail due to rust-lang/wg-cargo-std-aware#68. iroha_wasm_builder enables build-std feature, and also tries to remove instrument-coverage silently. I guess it stopped working (don't know why yet)

https://github.com/0x009922/iroha/blob/6926c61f021f514f29ad6f547274726733ff9128/crates/iroha_wasm_builder/src/lib.rs#L195-L196

https://github.com/0x009922/iroha/blob/6926c61f021f514f29ad6f547274726733ff9128/crates/iroha_wasm_builder/src/lib.rs#L370-L374

@0x009922 0x009922 changed the title refactor!: re-implement iroha_test_network refactor!: black-box integration tests Oct 2, 2024
@0x009922
Copy link
Contributor Author

0x009922 commented Oct 2, 2024

This PR becomes more and more God-like, too many things.

After fixing CI, I am considering splitting it into smaller PRs as much as possible.

And remove extra `iroha_wasm_builder` dependency

Signed-off-by: 0x009922 <[email protected]>
@0x009922
Copy link
Contributor Author

0x009922 commented Oct 3, 2024

Finally! All checks are truly green!

CI has successfully run all workspace tests with no default features in a single run within 53 seconds!

     Summary [  52.830s] 614 tests run: 614 passed (1 flaky), 9 skipped
   FLAKY 2/3 [   3.059s] iroha::mod integration::extra_functional::connected_peers::connected_peers_with_f_2_1_2

Same with all features enabled:

     Summary [  53.448s] 615 tests run: 615 passed (2 flaky), 9 skipped
   FLAKY 3/3 [   2.463s] iroha::mod integration::extra_functional::connected_peers::connected_peers_with_f_1_0_1
   FLAKY 2/3 [   2.899s] iroha::mod integration::extra_functional::connected_peers::connected_peers_with_f_2_1_2

Note: these tests are expectedly flaky due to #5104

@0x009922
Copy link
Contributor Author

0x009922 commented Oct 3, 2024

Closing this PR in order to split it into approximately these smaller chunks:

  • refactor!: black-box integration tests (still huge, with all tests changes and irohad rewamp)
  • build(wasm_samples): don't run iroha_wasm_builder from tests, pre-compile WASM samples
  • refactor(iroha_torii): single TCP listener
  • feat(iroha_core): graceful shutdown without network packets
  • fix(irohad, iroha_core): compile without default features
  • fix(iroha_config): broken trusted peers check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-changes Changes in the API for client libraries Enhancement New feature or request Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[suggestion] Refactor Iroha CLI
4 participants