Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarks for Wasm invocations for key value lookup module are too slow #4671

Open
tiziano88 opened this issue Jan 18, 2024 · 5 comments
Open
Assignees

Comments

@tiziano88
Copy link
Collaborator

#[bench]
fn bench_wasm_handler(bencher: &mut Bencher) {
if xtask::testing::skip_test() {
log::info!("skipping test");
return;
}
let runtime = tokio::runtime::Builder::new_current_thread()
.enable_io()
.enable_time()
.build()
.unwrap();
let wasm_path = oak_functions_test_utils::build_rust_crate_wasm("key_value_lookup").unwrap();
let lookup_data_file = oak_functions_test_utils::write_to_temp_file(
&oak_functions_test_utils::serialize_entries(hashmap! {
b"key_0".to_vec() => b"value_0".to_vec(),
b"key_1".to_vec() => b"value_1".to_vec(),
b"key_2".to_vec() => b"value_2".to_vec(),
b"empty".to_vec() => vec![],
}),
);
let (_server_background, server_port) =
runtime.block_on(xtask::launcher::run_oak_functions_example_in_background(
&wasm_path,
lookup_data_file.path().to_str().unwrap(),
));
// Wait for the server to start up.
std::thread::sleep(Duration::from_secs(20));
let summary = bencher.bench(|bencher| {
bencher.iter(|| {
let response = runtime.block_on(make_request(server_port, b"key_1"));
assert_eq!(b"value_1", &response.as_ref());
});
Ok(())
});
// When running `cargo test` this benchmark test gets executed too, but `summary` will be `None`
// in that case. So, here we first check that `summary` is not empty.
if let Ok(Some(summary)) = summary {
// `summary.mean` is in nanoseconds, even though it is not explicitly documented in
// https://doc.rust-lang.org/test/stats/struct.Summary.html.
let elapsed = Duration::from_nanos(summary.mean as u64);
// We expect the `mean` time for loading the test Wasm module and running its main function
// to be less than a fixed threshold.
assert!(
elapsed < Duration::from_millis(5),
"elapsed time: {:.0?}",
elapsed
);
}
}

Currently this takes around 10ms, which is way more than expected.

From my investigations, it seems that around 7ms of overhead are caused by the creation of a server encryptor:

pub fn create(
serialized_encapsulated_public_key: &[u8],
recipient_context_generator: Arc<dyn RecipientContextGenerator>,
) -> anyhow::Result<Self> {
let recipient_context = recipient_context_generator
.generate_recipient_context(serialized_encapsulated_public_key)
.context("couldn't generate recipient crypto context")?;
Ok(Self::new(recipient_context))
}

I also realised that we may be building the application in dev mode (instead of release).

And @andrisaar suggested:

can you try compiling and running with RUSTFLAGS=-C target-cpu=native?

cc @conradgrobler @ernoc @ipetr0v @pmcgrath17

@tiziano88 tiziano88 self-assigned this Jan 18, 2024
@tiziano88
Copy link
Collaborator Author

can you try compiling and running with RUSTFLAGS=-C target-cpu=native?

Unfortunately this didn't work at all; when compiling, I got a long list of this kind of warnings:

'+sse2' is not a recognized feature for this target (ignoring feature)
'+rdseed' is not a recognized feature for this target (ignoring feature)
'+avx512vbmi2' is not a recognized feature for this target (ignoring feature)
'-prefetchi' is not a recognized feature for this target (ignoring feature)
'+rdpid' is not a recognized feature for this target (ignoring feature)
'-fma4' is not a recognized feature for this target (ignoring feature)
'+avx512vbmi' is not a recognized feature for this target (ignoring feature)
'+shstk' is not a recognized feature for this target (ignoring feature)
'+vaes' is not a recognized feature for this target (ignoring feature)
'-waitpkg' is not a recognized feature for this target (ignoring feature)
'-sgx' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+avx512dq' is not a recognized feature for this target (ignoring feature)
'-sse4a' is not a recognized feature for this target (ignoring feature)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)

and also

LLVM ERROR: Do not know how to split the result of this operator!

error: could not compile `oak_stage0_bin` (bin "oak_stage0_bin")

@tiziano88
Copy link
Collaborator Author

Though maybe I should try doing that only for the Oak Functions binary.

@andrisaar
Copy link
Collaborator

Yeah, do it only for the Oak Functions binary. stage0 is special.

@conradgrobler
Copy link
Collaborator

If the performance issue is related to crypto impelementaiton I assume this would be because of x25519, so perhaps also look at https://github.com/dalek-cryptography/curve25519-dalek/tree/main/curve25519-dalek#backends and https://github.com/dalek-cryptography/curve25519-dalek/tree/main/curve25519-dalek#simd-backend

@tiziano88
Copy link
Collaborator Author

It turns out that when I changed to building in release mode, I didn't update the paths in tests, so it was still using the debug binaries anyways 🙄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants