benchmarks for Wasm invocations for key value lookup module are too slow #4671

tiziano88 · 2024-01-18T13:33:46Z

oak/oak_functions/examples/key_value_lookup/module/src/tests.rs

Lines 69 to 124 in a91172d

    
           #[bench] 
        
           fn bench_wasm_handler(bencher: &mut Bencher) { 
        
               if xtask::testing::skip_test() { 
        
                   log::info!("skipping test"); 
        
                   return; 
        
               } 
        
               let runtime = tokio::runtime::Builder::new_current_thread() 
        
                   .enable_io() 
        
                   .enable_time() 
        
                   .build() 
        
                   .unwrap(); 
        
               let wasm_path = oak_functions_test_utils::build_rust_crate_wasm("key_value_lookup").unwrap(); 
        
               let lookup_data_file = oak_functions_test_utils::write_to_temp_file( 
        
                   &oak_functions_test_utils::serialize_entries(hashmap! { 
        
                       b"key_0".to_vec() => b"value_0".to_vec(), 
        
                       b"key_1".to_vec() => b"value_1".to_vec(), 
        
                       b"key_2".to_vec() => b"value_2".to_vec(), 
        
                       b"empty".to_vec() => vec![], 
        
                   }), 
        
               ); 
        
               let (_server_background, server_port) = 
        
                   runtime.block_on(xtask::launcher::run_oak_functions_example_in_background( 
        
                       &wasm_path, 
        
                       lookup_data_file.path().to_str().unwrap(), 
        
                   )); 
        
               // Wait for the server to start up. 
        
               std::thread::sleep(Duration::from_secs(20)); 
        
               let summary = bencher.bench(|bencher| { 
        
                   bencher.iter(|| { 
        
                       let response = runtime.block_on(make_request(server_port, b"key_1")); 
        
                       assert_eq!(b"value_1", &response.as_ref()); 
        
                   }); 
        
                   Ok(()) 
        
               }); 
        
               // When running `cargo test` this benchmark test gets executed too, but `summary` will be `None` 
        
               // in that case. So, here we first check that `summary` is not empty. 
        
               if let Ok(Some(summary)) = summary { 
        
                   // `summary.mean` is in nanoseconds, even though it is not explicitly documented in 
        
                   // https://doc.rust-lang.org/test/stats/struct.Summary.html. 
        
                   let elapsed = Duration::from_nanos(summary.mean as u64); 
        
                   // We expect the `mean` time for loading the test Wasm module and running its main function 
        
                   // to be less than a fixed threshold. 
        
                   assert!( 
        
                       elapsed < Duration::from_millis(5), 
        
                       "elapsed time: {:.0?}", 
        
                       elapsed 
        
                   ); 
        
               } 
        
           }

Currently this takes around 10ms, which is way more than expected.

From my investigations, it seems that around 7ms of overhead are caused by the creation of a server encryptor:

oak/oak_crypto/src/encryptor.rs

Lines 259 to 267 in a91172d

    
           pub fn create( 
        
               serialized_encapsulated_public_key: &[u8], 
        
               recipient_context_generator: Arc<dyn RecipientContextGenerator>, 
        
           ) -> anyhow::Result<Self> { 
        
               let recipient_context = recipient_context_generator 
        
                   .generate_recipient_context(serialized_encapsulated_public_key) 
        
                   .context("couldn't generate recipient crypto context")?; 
        
               Ok(Self::new(recipient_context)) 
        
           }

I also realised that we may be building the application in dev mode (instead of release).

And @andrisaar suggested:

can you try compiling and running with RUSTFLAGS=-C target-cpu=native?

cc @conradgrobler @ernoc @ipetr0v @pmcgrath17

tiziano88 · 2024-01-18T16:22:05Z

can you try compiling and running with RUSTFLAGS=-C target-cpu=native?

Unfortunately this didn't work at all; when compiling, I got a long list of this kind of warnings:

'+sse2' is not a recognized feature for this target (ignoring feature)
'+rdseed' is not a recognized feature for this target (ignoring feature)
'+avx512vbmi2' is not a recognized feature for this target (ignoring feature)
'-prefetchi' is not a recognized feature for this target (ignoring feature)
'+rdpid' is not a recognized feature for this target (ignoring feature)
'-fma4' is not a recognized feature for this target (ignoring feature)
'+avx512vbmi' is not a recognized feature for this target (ignoring feature)
'+shstk' is not a recognized feature for this target (ignoring feature)
'+vaes' is not a recognized feature for this target (ignoring feature)
'-waitpkg' is not a recognized feature for this target (ignoring feature)
'-sgx' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+avx512dq' is not a recognized feature for this target (ignoring feature)
'-sse4a' is not a recognized feature for this target (ignoring feature)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)
'tigerlake' is not a recognized processor for this target (ignoring processor)

and also

LLVM ERROR: Do not know how to split the result of this operator!

error: could not compile `oak_stage0_bin` (bin "oak_stage0_bin")

tiziano88 · 2024-01-18T16:24:07Z

Though maybe I should try doing that only for the Oak Functions binary.

andrisaar · 2024-01-18T16:29:03Z

Yeah, do it only for the Oak Functions binary. stage0 is special.

conradgrobler · 2024-01-18T16:35:43Z

If the performance issue is related to crypto impelementaiton I assume this would be because of x25519, so perhaps also look at https://github.com/dalek-cryptography/curve25519-dalek/tree/main/curve25519-dalek#backends and https://github.com/dalek-cryptography/curve25519-dalek/tree/main/curve25519-dalek#simd-backend

tiziano88 · 2024-01-18T16:46:26Z

It turns out that when I changed to building in release mode, I didn't update the paths in tests, so it was still using the debug binaries anyways 🙄

tiziano88 self-assigned this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks for Wasm invocations for key value lookup module are too slow #4671

benchmarks for Wasm invocations for key value lookup module are too slow #4671

tiziano88 commented Jan 18, 2024

tiziano88 commented Jan 18, 2024

tiziano88 commented Jan 18, 2024

andrisaar commented Jan 18, 2024

conradgrobler commented Jan 18, 2024

tiziano88 commented Jan 18, 2024

benchmarks for Wasm invocations for key value lookup module are too slow #4671

benchmarks for Wasm invocations for key value lookup module are too slow #4671

Comments

tiziano88 commented Jan 18, 2024

tiziano88 commented Jan 18, 2024

tiziano88 commented Jan 18, 2024

andrisaar commented Jan 18, 2024

conradgrobler commented Jan 18, 2024

tiziano88 commented Jan 18, 2024