Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

transform array of objects into array of pointers to unique objects

I am trying to transform an array of objects into an array of object pointers, where the pointers point to elements of an array which contains all unique objects of the first array.

The objects I am using are not cheap to copy as they involve buffer allocation and buffer copying. They are, however, cheap to move.

Example: The array

[G,F,E,G,E,G]

should be transformed into a unique object array
U = [E,F,G] and a pointer array
P = [&U[2], &U[1], &U[0], &U[2], &U[0], &U[2]]

I am currently using the following code to achieve this:

int N; // 50 Millions and more
std::vector<MyObj> objarray; // N elements
std::vector<MyObj*> ptrarray; // N elements
...
std::vector<MyObj> tmp(objarray.begin(), objarray.end());

std::sort(objarray.begin(), objarray.end());
auto unique_end = std::unique(objarray.begin(), objarray.end());

// now, [objarray.begin(), unique_end) contains all unique objects

std::map<MyObj, int> indexmap;

// save index for each unique object
int index = 0;
for(auto it = objarray.begin(); it != uniqueend; it++){
    indexmap[*it] = index;
    index++;
}

//for each object in original array, look up index in unique object array and save the pointer
for(int i = 0; i < N; i++)
    ptrarray[i] = &objarray[indexmap[tmp[i]]];

Is there a more efficient way to achieve this, possibly without creating a copy of the original array since object copies are expensive?

like image 783
Abator Abetor Avatar asked Jan 26 '26 12:01

Abator Abetor


1 Answers

struct r {
  std::vector<MyObj> objects;
  std::vector<MyObj*> ptrs;
};

r func( std::vector<MyObj> objarray ) {

  // makes a vector containing {0, 1, 2, 3, ..., N-1}
  auto make_index_buffer = [&]{
    std::vector<std::size_t> r;
    r.reserve(objarray.size());
    for (std::size_t i = 0; i < objarray.size(); ++i)
      r.push_back( i );
    return r;
  };

  // build a buffer of unique element indexes:
  auto uniques = make_index_buffer();

  // compares indexes by their object: 
  auto index_less = [&](auto lhs, auto rhs) { return objarray[lhs]<objarray[rhs]; };
  auto index_equal = [&](auto lhs, auto rhs) { return objarray[lhs]==objarray[rhs]; };

  std::sort( uniques.begin(), uniques.end(), index_less );
  uniques.erase( std::unique( uniques.begin(), uniques.end(), index_equal ), uniques.end() );

  // build table of index to unique index:
  std::map<std::size_t, std::size_t, index_less> table;
  for (std::size_t& i : uniques)
    table[i] = &i-uniques.data();

  // list of index to unique index for each element:
  auto indexes = make_index_buffer();

  // make indexes unique:
  for (std::size_t& i:indexes)
    i = table[i];

  // after this, table will be invalidated.  Clear it first:
  table = {};

  // build unique object list:
  std::vector<MyObj> objects;
  objects.reserve( uniques.size() );
  for (std::size_t i : uniques)
    objects.push_back( std::move(objarray[i]) );

  // build pointer objects:
  std::vector<MyObj*> ptrarray; // N elements
  ptrarray.reserve( indexes.size() );
  for (std::size_t i : indexes)
    ptrarray.push_back( std::addressof( objects[i] ) );

  return {std::move(objects), std::move(ptrarray)};
}

This does exactly N moves of MyObj, where N is the number of unique MyObj in your original vector.

Yours did M lg M moves of MyObj, and N copies, where M is the number of objects and N is the number of unique objects.

Mine does some allocation (of size_ts) you can probably clean up, but that would make it a bit less clear.

like image 103
Yakk - Adam Nevraumont Avatar answered Jan 29 '26 00:01

Yakk - Adam Nevraumont