Splitting an STL list based on a condition

Question

I have an std::list which looks as follows (the x marks indicate some number less than 500)

x,x,x,x,503,x,x,x,510,x,x,x,502,x,x,x,x,x,x,600 - std::list<int> originallist

I am looking to split the list into a vector of lists std::vector<std::list<int> > as follows

1st element of vector: x,x,x,x,503
2nd element of vector: x,x,x,510
...
...
last element of vector: x,x,x,x,x,x,600

The code I have now is as follows:

list<int> templist; vector<list<int> > v;
for(list<int>::iterator lit=originallist.begin(); lit!=oriniallist.end(); ++lit) {
    if (*lit > 500) {
        templist.push_back(*lit);v.push_back(templist); templist.clear(); continue;
    }
    templist.push_back(*lit);
}

What is the most efficient way to achieve the above task in c++ without using templist. Any help is appreciated.

Yakk - Adam Nevraumont · Accepted Answer

While this solution does use a temporary std::list, it allocates no list node elements, and does exactly 1 memory allocation in the C++03 case (the C++11 case does a logarithmic number of memory allocations on the size of the return value)

This is a C++03 solution. A C++11 solution can do this in one pass.

bool big_as_500( int x ) {return x>=500;}

std::vector< std::list< int > > do_stuff( std::list<int>& original_list ) {
  // we have to do this, because resizing the return value involves lots of allocations
  // and stuff in C++03, so make sure we get the size right by precalculating it:
  std::size_t count = std::count_if( originallist.begin(), originallist.end(), big_as_500 );
  std::vector< std::list< int > > result;
  result.reserve(count+1); 

  typedef std::list<int>::const_iterator const_iterator;
  std::list< int > current;
  for(const_iterator it= originallist.begin(); it!=originallist.end();/*nothing*/) {
    ++it; // about to invalidate it! (or move lists)
    current.splice( current.end(), originallist, originallist.begin() ); // O(1) no memory allocation
    if (big_as_500(current.back())) {
      result.push_back( std::list<int>() );
      current.swap( result.back() );
    }
  }
  // original problem does not specify what to do if the original list does not end
  // with an element "big_as_500", so I'll just drop them
  return result; // rely on NRVO to eliminate the copy here, if your compiler does not
  // support it, take result as a reference parameter.
}

A C++11 solution:

std::vector< std::list< int > > do_stuff( std::list<int>& original_list ) {
  std::vector< std::list< int > > result;

  typedef std::list<int>::const_iterator const_iterator;
  std::list< int > current;
  for(const_iterator it= originallist.begin(); it!=originallist.end();/*nothing*/) {
    ++it;// about to become invalid/in wrong list
    current.splice( current.end(), originallist, originallist.begin() ); // O(1) no memory allocation
    if (current.back() >= 500) {
      result.emplace_back( std::move(current) );
    }
  }
  // original problem does not specify what to do if the original list does not end
  // with an element "big_as_500", so I'll just drop them
  return result; // will NRVO, or move, so no worries
}

in C++11, resizes are relatively cheap, so we are good.

Now, we could get really fancy in C++03 and emulate what C++11 does and do it all in one pass.

template<typename T, typename A>
void efficient_grow_by_1( std::vector<T,A>& make_one_bigger ) {
  if (make_one_bigger.size()+1 > make_one_bigger.capacity() )
  {
    std::vector<T, A> swap_vec;
    swap_vec.reserve( (make_one_bigger.size()+1)*5/3 );
    for (std::vector<T, A>::iterator it = make_one_bigger.begin(); it != make_one_bigger.end(); ++it ) {
      using std::swap;
      swap_vec.push_back();
      std::swap( *it, swap_vec.back() );
    }
    swap_vec.swap( make_one_bigger );
  }
  make_one_bigger.push_back();
}
void do_stuff( std::list<int>& original_list, std::vector< std::list< int > >& result ) {
  typedef std::list<int>::const_iterator const_iterator;
  std::list< int > current;
  for(const_iterator it= originallist.begin(); it!=originallist.end();) {
    ++it;
    current.splice( current.end(), originallist, originallist.begin() ); // O(1) no memory allocation
    if (current.back()>=500) {
      efficient_grow_by_1(result);
      current.swap( result.back() );
    }
  }
  // original problem does not specify what to do if the original list does not end
  // with an element "big_as_500", so I'll just drop them
}

which is rather insane, so I'd advise upgrading your compiler.

The trick here is that we populate the 'temporary' list with a single-element-at-a-time splice. Because (most? many?) implementations of std::list::splice end up having to walk over the elements to count them (it is required in C++11, and common in C++03), doing it one at a time as we determine which elements we want to put into the next chunk is reasonably efficient. Each node comes directly from the input list, and is collected into the temporary list (no memory allocations).

Once we have built up this list, we directly swap it into the output vector of lists. This avoids any memory allocations, other than that which is required to hold the (relatively small) base data of the list.

In C++03, we either do a two-pass solution and pre calculate how big the output std::vector is, or we emulate C++11 move efficiency with careful growth and swap mechanics on the contained lists. It is possible that your std library implementation fakes this already, but I am unsure how common swap-resize optimization was in the old libraries.

Keeping things down to a single pass is probably worth the logarithmic number of allocations that the 2nd C++03 and C++11 solutions use: walking a std::list is an exercise in cache misses.

Splitting an STL list based on a condition

Tags:

c++

algorithm

list

stl

vector

user62089

1 Answers

Yakk - Adam Nevraumont

Recent Activity

Donate For Us

Splitting an STL list based on a condition

Tags:

c++

algorithm

list

stl

vector

user62089

1 Answers

Yakk - Adam Nevraumont

Related questions

Recent Activity

Donate For Us