Sorting 32 bit ints is as fast as sorting 64 bit ints

Question

Here are things I believe are facts:

Quick sort should be fairly cache friendly.
A cache line of 64 bytes can contain 16 32-bit ints, or 8 64-bit ints.

Hypothesis:

Sorting a vector of 32-bit integers should be faster than sorting a vector of 64-bit integers.

But when I run the code below I get the result:

i16 = 7.5168 
i32 = 7.3762 
i64 = 7.5758

Why am I not getting the results I want?

C++:

#include <iostream> 
#include <vector>
#include <cstdint>
#include <algorithm>
#include <chrono>


int main() {
  const int vlength = 100'000'000;
  const int maxI = 50'000;

  std::vector<int16_t> v16;
  for (int i = 0; i < vlength; ++i) {
    v16.push_back(int16_t(i%maxI));
  }
  std::random_shuffle(std::begin(v16), std::end(v16));
  std::vector<int32_t> v32;
  std::vector<int64_t> v64;
  for (int i = 0; i < vlength; ++i) {
    v32.push_back(int32_t(v16[i]));
    v64.push_back(int64_t(v16[i]));
  }

  auto t1 = std::chrono::high_resolution_clock::now();
  std::sort(std::begin(v16), std::end(v16));
  auto t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  std::sort(std::begin(v32), std::end(v32));
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  std::sort(std::begin(v64), std::end(v64));
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

}

EDIT: In order to avoid the question of how cache friendly sort is, I've also tried the following code:

template <typename T>
inline void function_speed(T& vec) {
  for (auto& i : vec) {
    ++i;
  }
}

int main() {
  const int nIter = 1000;

  std::vector<int16_t> v16(1000000);
  std::vector<int32_t> v32(1000000);
  std::vector<int64_t> v64(1000000);



  auto t1 = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < nIter; ++i) {
    function_speed(v16);
  }
  auto t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < nIter; ++i) {
    function_speed(v32);
  }
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < nIter; ++i) {
    function_speed(v64);
  }
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

}

Typical result:

i16 = 0.00618648
i32 = 0.00617911
i64 = 0.00606275

I know that proper benchmarking is a science of itself, perhaps I am doing to wrong.

EDIT2: By avoiding overflowing I am now starting to get more interesting results:

template <typename T>
inline void function_speed(T& vec) {
  for (auto& i : vec) {
    ++i;
    i %= 1000;
  }
}

Gives results such as:

i16 = 0.0143789
i32 = 0.00958941
i64 = 0.019691

If I instead do:

template <typename T>
inline void function_speed(T& vec) {
  for (auto& i : vec) {
    i = (i+1)%1000;
  }
}

I get:

i16 = 0.00939448
i32 = 0.00913768
i64 = 0.019615

MSalters · Accepted Answer

Mistaken assumption; all O(N log N) sorting algorithms have to be cache-unfriendly for the vast majority of the N! possible inputs.

Furtehrmore, I think an optimizing compiler can remove the sorts outright, and an unoptimized build will of course be pointless to benchmark.

Sorting 32 bit ints is as fast as sorting 64 bit ints

Tags:

c++

int

memory

caching

Klein

1 Answers

MSalters

Recent Activity

Donate For Us

Sorting 32 bit ints is as fast as sorting 64 bit ints

Tags:

c++

int

memory

caching

Klein

1 Answers

MSalters

Related questions

Recent Activity

Donate For Us