Let's say I have the following benchmark to test how log::info!
impacts the performance of the code.
Side note: the log
crate in rust is only a facade, and in order to generate output it requires a compatible implementation. Since no implementation is present in the code below, I wanted to see if the compiler could optimize it out.
Here is the code:
use std::collections::HashSet;
use criterion::{criterion_group, criterion_main, Criterion};
fn f<const LOG: bool>() -> usize {
let mut x = 0;
let mut hs = HashSet::<i32>::new();
for i in 0..10000 {
x += i;
if hs.contains(&x) {
x *= 2;
} else {
x *= 3;
}
x = x % 1000;
hs.insert(x);
if LOG {
log::info!("{}: {}", i, hs.len());
}
}
hs.len()
}
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("without", |b| b.iter(|| f::<false>()));
c.bench_function("with", |b| b.iter(|| f::<true>()));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
Running
cargo clean
cargo bench
Produces
without time: [265.71 µs 272.44 µs 280.77 µs]
Found 15 outliers among 100 measurements (15.00%)
6 (6.00%) high mild
9 (9.00%) high severe
with time: [276.52 µs 277.55 µs 278.80 µs]
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
Then, running the benchmark again
cargo bench
Produces
without time: [261.80 µs 263.03 µs 264.49 µs]
change: [-7.5920% -3.7254% +0.1192%] (p = 0.06 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
with time: [264.31 µs 265.26 µs 266.32 µs]
change: [-9.9053% -6.0486% -2.2028%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe
For the third time, running the benchmark again
cargo bench
Produces
without time: [251.02 µs 251.39 µs 251.83 µs]
change: [-7.6265% -4.5399% -1.4715%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
with time: [251.56 µs 251.94 µs 252.38 µs]
change: [-8.3006% -5.3281% -2.1746%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
Why does each consequent run show performance improvement? The code and and the environment are identical.
The Criterion.rs user guide answers this question fairly well.
Typically this happens because the benchmark environments aren't quite the same. There are a lot of factors that can influence benchmarks. Other processes might be using the CPU or memory. Battery-powered devices often have power-saving modes that clock down the CPU (and these sometimes appear in desktops as well). If your benchmarks are run inside a VM, there might be other VMs on the same physical machine competing for resources.
However, sometimes this happens even with no change. It's important to remember that Criterion.rs detects regressions and improvements statistically. There is always a chance that you randomly get unusually fast or slow samples, enough that Criterion.rs detects it as a change even though no change has occurred. In very large benchmark suites you might expect to see several of these spurious detections each time you run the benchmarks.
Unfortunately, this is a fundamental trade-off in statistics. In order to decrease the rate of false detections, you must also decrease the sensitivity to small changes. Conversely, to increase the sensitivity to small changes, you must also increase the chance of false detections. Criterion.rs has default settings that strike a generally-good balance between the two, but you can adjust the settings to suit your needs.
One can configure the benchmark to use a different sample count as described in Advanced Configuration, thus reducing noise at the cost of sensitivity:
fn bench(c: &mut Criterion) {
let mut group = c.benchmark_group("sample-size-example");
// Configure Criterion.rs to detect smaller differences and increase sample size to improve
// precision and counteract the resulting noise.
group.significance_level(0.1).sample_size(500);
group.bench_function("my-function", |b| b.iter(|| my_function());
group.finish();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With