Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copying and Padding a Vectors rows quickly

Tags:

c++

stdvector

System is:

Ubuntu Linux 20.04.6 LTS
g++ version 9.4.0
IDE QTCreator

I have two vectors (set and subset), subset needs to be transferred into set, at location (0,0), such that, the rows are padded with 0.

I have the following code which takes ~9mS to run, but I'd like to know if there is a more efficient (quicker) way to do it?

#include <iostream>
#include <vector>
#include <chrono>

const uint16_t SET_ROWS{2048};
const uint16_t SET_COLS{1024};
const uint16_t SUBSET_ROWS{1024};
const uint16_t SUBSET_COLS{768};

const uint32_t SET_FOOTPRINT{SET_ROWS*SET_COLS};
const uint32_t SUBSET_FOOTPRINT{SUBSET_ROWS*SUBSET_COLS};

/*
======================  -> SET
|        | -> SUBSET |
|        |           |
|________|           |
|                    |
|                    |
|                    |
======================
*/

int main()
{
  std::vector<uint16_t> subset(SUBSET_FOOTPRINT,1);
  std::vector<uint16_t> set(SET_FOOTPRINT,0);
  std::chrono::time_point<std::chrono::system_clock> start_time;
  std::chrono::time_point<std::chrono::system_clock> end_time;

  start_time = std::chrono::system_clock::now();

  for(int a=0; a<SUBSET_ROWS; a++) {                        
      for(int b=0; b<SUBSET_COLS; b++) {
          set[a*SET_ROWS+b] = subset[a*SUBSET_COLS+b];
        }
    }

  end_time = std::chrono::system_clock::now();
  /*
   111 ... 000
   111 ... 000
   111 ... 000
   .
   .
   .
   111 ... 000
   */

  std::cout<<"mS "<<std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time).count();

  return 0;
}

The power of the Matrix is strong, but I'm forced to use vectors.

like image 413
Mike Avatar asked Dec 22 '25 22:12

Mike


1 Answers

Since you are copying sequential values of a TriviallyCopyable type (uint16_t), you could try to replace the inner loop with a std::memcpy (declared in <cstring>).
It is usually highly optimized.

Something like:

for (int a = 0; a < SUBSET_ROWS; ++a) {                     
    auto* pSrc = &subset[a * SUBSET_COLS];  // or: std::next(subset, a * SUBSET_COLS)
    auto* pDst = &set[a * SET_COLS];        // or: std::next(set, a * SET_COLS)
    std::memcpy(pDst, pSrc, sizeof(*pSrc) * SUBSET_COLS);
}

You can also use std::copy which has a more C++ style interface using iterators (declared in <algorithm>):

for (int a = 0; a < SUBSET_ROWS; ++a) {                     
    auto* pSrc = &subset[a * SUBSET_COLS];
    auto* pDst = &set[a * SET_COLS];
    std::copy(pSrc, pSrc + SUBSET_COLS, pDst);
}

As @PeteBecker commented, std::copy is likely to be implemented using memcpy here for trivial types so the performance should be similar.

Note that your current posted code does not include the actual padding with zeroes. You can achive that using std::memset (or std::fill - which is again more in C++ style) for the relevant memory region.

However
Note that in order to better evaluate the performance difference, you might need to increase the matrices size (or perform several copies) to avoid noise.
Also - as others commented - performance analysis should always be done with a fully optimized build.

like image 64
wohlstad Avatar answered Dec 24 '25 12:12

wohlstad



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!