I am trying to determine the number of cache lines loaded in L1 cache (Processor Intel Broadwell). my kernel code is
a[i] = 2*b[i] + 2.3 // i from 0 to pow(10,8)
I am using the perf event L1-dcache-load-misses
. The measured number is twice than expected. I am expecting 6M loads, and 6M stores. But L1-dcache-load-misses
is around 12M. However LLC-stores
is as expected (6M)
i) Does L1-dcache-load-misses
count both load and store misses?
In Intel software developer manual (table 19.5), for L2 cache, I found two metrics
L2_TRANS.L2_FILL
(r20f0) L2_TRANS.L2_WB
(r40f0)ii) What is the exact meaning of L2_TRANS.L2_FILL? Is it the total number of L2 transactions?
iii) What is the exact meaning of L2_TRANS.L2_WB? Is it the total number of L2 write transactions?
Perf uses these event aliases that map to predefined counter events and masks, but since each CPU may have different mapping, this tends to shift sometimes, and you may end up counting something else.
This discussion on an Intel forum, suggests that at least some system (Haswell, but Broadwell should be quite similar) had L1-dcache-load-misses
incorrectly mapped to L1 replacements, which would explain the double value (the stores would also fetch lines into the L1 cache).
As for the L2_trans events, assuming they're correctly mapped, they should indeed count the total fills and evictions from the L2. Note that this may include more that your loads + stores, since L2 also has code (probably negligible in such a small kernel), and prefetching (probably significant since your data is spatially laid out and easy to prefetch).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With