How to count memory accesses to remote NUMA memory nodes?

Question

In a multi-threaded application running on a recent linux Distributed Shared Memory system, is there a straight forward way to count the number of requests per thread to remote (non-local) NUMA memory nodes?

I am thinking of using PAPI to count interconnect traffic. Is this the way to go?

In my application, threads are bound to a particular core or processor for their entire life-time. When the application begins, memory is allocated page wise and spread in a round-robin manner across all available NUMA memory nodes.

Thank you for your answers.

Neil Justice · Accepted Answer

If you have access to VTune, local and remote NUMA node accesses are counted by hardware counters OFFCORE_RESPONSE.ANY_DATA.OTHER_LOCAL_DRAM_0 for fast local NUMA node accesses and OFFCORE_RESPONSE.ANY_DATA.REMOTE_DRAM_0 for slower remote NUMA node acccesses.

How the counters appear in VTune:

Configuring NUMA hardware counters in VTune

How the counters look in two scenarios:

NUMA unhappy code: core 0 (NUMA node 0) increments 50 MB residing on NUMA node 1: NUMA unhappy code with many remote NUMA node accesses

NUMA happy code: core 0 (NUMA node 0) increments 50 MB residing on NUMA node 0: NUMA happy code with many local NUMA node accesses

Fidel · Answer

I found the pcm-numa.x tool that comes with Intel PCM to be quite useful. It tells you the number of times each core has accessed the local or remote NUMA nodes.

How to count memory accesses to remote NUMA memory nodes?

Tags:

multithreading

numa

papi

nandu

2 Answers

Neil Justice

Fidel

Recent Activity

Donate For Us

How to count memory accesses to remote NUMA memory nodes?

Tags:

multithreading

numa

papi

nandu

2 Answers

Neil Justice

Fidel

Related questions

Recent Activity

Donate For Us