Skip to content

Commit

Permalink
Modify regression threshold to configurable percentage (#4698)
Browse files Browse the repository at this point in the history
  • Loading branch information
kaukabrizvi authored Aug 14, 2024
1 parent 7d73fd6 commit fc9ad97
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 18 deletions.
10 changes: 5 additions & 5 deletions tests/regression/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This folder contains regression tests and benchmarking tools for the `s2n-tls` l

## Testing Philosophy

Currently, s2n-tls implements a wall clock benchmarking tool which measures end-to-end handshake performance to compare s2n-tls with rustls and OpenSSL. In the past, s2n-tls has tried benchmarking to detect regressions through criterion in Rust, but the subprocess and spin-up time contributed to performance measurement which made the results inaccurate and difficult to use in CI. The project has a slightly different focus, learning from these existing tools. Performance assertion in s2n-tls focuses on a benchmarking tool that can detail performance by API path and do so with enough repeatability and accuracy to detect regressions between two versions of s2n-tls so that performance analysis can occur at PR time. This means that the scope of each harness is limited and mutually exclusive of other harnesses since we are intersted in measuring the performance of the important paths a TLS connection typically follows.
Currently, s2n-tls implements a wall clock benchmarking tool which measures end-to-end handshake performance to compare s2n-tls with rustls and OpenSSL. In the past, s2n-tls has tried benchmarking to detect regressions through criterion in Rust, but the subprocess and spin-up time contributed to performance measurement which made the results inaccurate and difficult to use in CI. The project has a slightly different focus, learning from these existing tools. Performance assertion in s2n-tls focuses on a benchmarking tool that can detail performance by API path and do so with enough repeatability and accuracy to detect regressions between two versions of s2n-tls so that performance analysis can occur at PR time. This means that the scope of each harness is limited and mutually exclusive of other harnesses since we are interested in measuring the performance of the important paths a TLS connection typically follows.

### Why CPU instructions
The performance benchmarking framework utilizes CPU Instruction count across API paths to make the regression assertion. This technique reduces noise, ensuring that small performance differences are caught through this measure. While a difference in performance count may not result in a direct runtime difference, it is useful when comparing a PR to mainline and to dig into the specific sources of performance impact within the code.
Expand Down Expand Up @@ -33,7 +33,7 @@ To run the harnesses with Valgrind and store the annotated results, run:
PERF_MODE=valgrind cargo test
```

This will recursively call all tests with valgrind enabled so the performance output is generated and stored in target/perf_outputs. If you are looking for the scalar performance output of a PR, this will provide inisght into which portions of the code account for what share of the CPU instruction count.
This will recursively call all tests with valgrind enabled so the performance output is generated and stored in target/perf_outputs. If you are looking for the scalar performance output of a PR, this will provide insight into which portions of the code account for what share of the CPU instruction count.

## Running the Harnesses between versions (differential performance)
Run the scalar performance for all harnesses on the current branch version of s2n-tls
Expand All @@ -59,16 +59,16 @@ This will assert on the performance difference of the current version minus the
cargo test
```

This will run the tests without valgrind to test if the hanresses complete as expected
This will run the tests without valgrind to test if the harnesses complete as expected

## Output Files
- `target/$commit_id/test_name.raw`: Contains the raw cachegrind profile. On its own, the file is pretty much unreadable but is useful for the cg_annotate --diff functionality or to visualize the profile via tools like [KCachegrind](https://kcachegrind.github.io/html/Home.html).
- `target/$commit_id/test_name.annotated`: The scalar annotated profile associated with that particular commit id. This file contains detailed infromation on the contribution of functions, files, and lines of code to the overall scalar performance count.
- `target/$commit_id/test_name.annotated`: The scalar annotated profile associated with that particular commit id. This file contains detailed information on the contribution of functions, files, and lines of code to the overall scalar performance count.
- `target/diff/test_name.diff`: The annotated performance difference between two commits. This file contains the overall performance difference and also details the instruction counts, how many instructions a particular file/function account for, and the contribution of individual lines of code to the overall instruction count difference.

## Sample Output for Valgrind test (differential)

Running the differential test will run all harnesses and fail if any number of harnesses exceed the performance threshold. For example, a regression test faliure could look like:
Running the differential test will run all harnesses and fail if any number of harnesses exceed the performance threshold. For example, a regression test failure could look like:
```
running 2 tests
test tests::test_set_config ... FAILED
Expand Down
44 changes: 31 additions & 13 deletions tests/regression/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,6 @@ mod tests {
process::{Command, Output},
};

const MAX_DIFF: i64 = 1_000;

struct InstrumentationControl;

impl InstrumentationControl {
Expand Down Expand Up @@ -86,7 +84,11 @@ mod tests {
}
}

fn valgrind_test<F>(test_name: &str, test_body: F) -> Result<(), s2n_tls::error::Error>
fn valgrind_test<F>(
test_name: &str,
max_diff: f64,
test_body: F,
) -> Result<(), s2n_tls::error::Error>
where
F: FnOnce(&InstrumentationControl) -> Result<(), s2n_tls::error::Error>,
{
Expand All @@ -97,7 +99,7 @@ mod tests {
}
RegressionTestMode::Diff => {
let (prev_profile, curr_profile) = RawProfile::query(test_name);
DiffProfile::new(&prev_profile, &curr_profile).assert_performance();
DiffProfile::new(&prev_profile, &curr_profile).assert_performance(max_diff);
}
RegressionTestMode::Default => {
let ctrl = InstrumentationControl;
Expand Down Expand Up @@ -144,6 +146,11 @@ mod tests {
format!("target/{}/{}.raw", self.commit_hash, self.test_name)
}

// Returns the annotated profile associated with a raw profile
fn associated_annotated_profile(&self) -> AnnotatedProfile{
AnnotatedProfile::new(self)
}

/// Return the raw profiles for `test_name` in "git" order. `tuple.0` is older than `tuple.1`
///
/// This method will panic if there are not two profiles.
Expand Down Expand Up @@ -206,15 +213,23 @@ mod tests {
fn path(&self) -> String {
format!("target/{}/{}.annotated", self.commit_hash, self.test_name)
}

fn instruction_count(&self) -> i64 {
let output = &std::fs::read_to_string(self.path()).unwrap();
find_instruction_count(output).unwrap()
}
}

struct DiffProfile {
test_name: String,
prev_profile_count: i64,
}
impl DiffProfile {
fn new(prev_profile: &RawProfile, curr_profile: &RawProfile) -> Self {
let diff_profile = Self {
test_name: curr_profile.test_name.clone(),
// reads the annotated profile for previous instruction count
prev_profile_count: prev_profile.associated_annotated_profile().instruction_count()
};

// diff the raw profile
Expand All @@ -225,6 +240,7 @@ mod tests {
assert_command_success(diff_output.clone());

// write the diff to disk
create_dir_all(format!("target/diff")).unwrap();
let diff_content = String::from_utf8(diff_output.stdout)
.expect("Invalid UTF-8 in cg_annotate --diff output");
write(diff_profile.path(), diff_content).expect("Failed to write to file");
Expand All @@ -236,21 +252,23 @@ mod tests {
format!("target/diff/{}.diff", self.test_name)
}

fn assert_performance(&self) {
fn assert_performance(&self, max_diff: f64) {
let diff_content = std::fs::read_to_string(self.path()).unwrap();

let diff = find_instruction_count(&diff_content)
.expect("Failed to parse cg_annotate --diff output");
// percentage difference is the overall difference divided by the previous instruction count
let diff_percentage = diff as f64 / self.prev_profile_count as f64;
assert!(
diff <= MAX_DIFF,
"Instruction count difference exceeds the threshold, regression of {} instructions.
Check the annotated output logs in target/diff/{}.diff for debug information",
diff, self.test_name
diff_percentage <= max_diff,
"Instruction count difference exceeds the threshold, regression of {diff_percentage}% ({diff} instructions).
Check the annotated output logs in target/diff/{}.diff for debug information", self.test_name
);
}

}

// Pulls the instruction count as an integer from the annotated output file.
/// Pulls the instruction count as an integer from the annotated output file.
/// Accepts output from Annotated and Diff profile formats.
fn find_instruction_count(output: &str) -> Result<i64, io::Error> {
let reader = io::BufReader::new(output.as_bytes());
// Example of the line being parsed:
Expand Down Expand Up @@ -303,7 +321,7 @@ mod tests {
/// Test to create new config, set security policy, host_callback information, load/trust certs, and build config.
#[test]
fn test_set_config() {
valgrind_test("test_set_config", |ctrl| {
valgrind_test("test_set_config", 0.01, |ctrl| {
ctrl.stop_instrumentation();
ctrl.start_instrumentation();
let keypair_rsa = CertKeyPair::default();
Expand All @@ -317,7 +335,7 @@ mod tests {
/// Test which creates a TestPair from config using `rsa_4096_sha512`. Only measures a pair handshake.
#[test]
fn test_rsa_handshake() {
valgrind_test("test_rsa_handshake", |ctrl| {
valgrind_test("test_rsa_handshake", 0.01, |ctrl| {
ctrl.stop_instrumentation();
let keypair_rsa = CertKeyPair::default();
let config = set_config(&security::DEFAULT_TLS13, keypair_rsa)?;
Expand Down

0 comments on commit fc9ad97

Please sign in to comment.