Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant way to provide flatting iterator for vector of vectors

I have an adapter, whose goal is to provide forward iterator for pair values pair<FeatureVector, Label>. However in my internal representation I store data as vector<pair<vector<strings>, Label>>.

So during iterations, I need to flatten it and convert every single string, which is short sentence like "oil drops massively today", to FeatureVector

In raw variant I have something like:

{
  {"Oil drops massively","OPEC surge oil produciton","Brent price goes up" -> "OIL_LABEL"}, 
  {"France consume more vine", "vine production in Italy drops" -> "VINE_LABEL"}
}

and I need to convert it to:

{
  vectorize("Oil drops massively") -> "OIL_LABEL", 
  vectorize("OPEC surge oil produciton") -> "OIL_LABEL", ... , 
  vectorize("vine production in Italy drops") -> "VINE_LABEL"
}

vectorize() -> it's a conversion from sentence to sparse vector like this "Oil drops on NYSE" -> {0,1,0..0,1,0..0,1}

The simpliest way will be create new vector and initialize it with all data and than use it's iterators, but this is pretty resource havy operation, so ideally I want this kind of conversion to be done over each iteration. What is the most elegant way for such kind of conversion?

This is a simplified version of data structure for storing text corpus. Iterators later need to be used in classifier initialization, which require 2 iterators: begin and end which is logically similar to the same as in vector.

like image 990
silent_coder Avatar asked Dec 21 '25 18:12

silent_coder


1 Answers

A simple range type:

template<class It>
struct range_t {
  It b{},e{};
  It begin() const {return b;}
  It end() const {return e;}
  bool empty() const {return begin()==end();}
  friend bool operator==(range_t lhs, range_t rhs){
    if (lhs.empty() && rhs.empty()) return true;
    return lhs.begin() == rhs.begin() && lhs.end() == rhs.end();
  }
  friend bool operator!=(range_t lhs, range_t rhs){
    return !(lhs==rhs);
  }
  range_t without_front( std::size_t N = 1 ) const {
    return { std::next(begin(), N), end() };
  }
  range_t without_back( std::size_t N = 1 ) const {
    return { begin(), std::prev(end(),N) };
  }
  decltype(auto) front() const {
    return *begin();
  }
  decltype(auto) back() const {
    return *std::prev(end());
  }
};
template<class It>
range_t<It> range( It b, It e ) {
  return {b,e};
}

Here is a non-compliant pseudo-iterator that does the cross product of two ranes:

template<class ItA, class ItB>
struct cross_iterator_t {
  range_t<ItA> cur_a;
  range_t<ItB> orig_b;
  range_t<ItB> cur_b;

  cross_iterator_t( range_t<ItA> a, range_t<ItB> b ):
    cur_a(a), orig_b(b), cur_b(b)
  {}

  bool empty() const { return cur_a.empty() || cur_b.empty(); }

  void operator++(){
    cur_b = cur_b.without_front();
    if (cur_b.empty()) {
      cur_a = cur_a.without_front();
      if (cur_a.empty()) return;
      cur_b = orig_b;
    }
  }
  auto operator*()const {
    return std::make_pair( cur_a.front(), cur_b.front() );
  }
  friend bool operator==( cross_iterator_t lhs, cross_iterator_t rhs ) {
    if (lhs.empty() && rhs.empty()) return true;

    auto mytie=[](auto&& self){
      return std::tie(self.cur_a, self.cur_b);
    };
    return mytie(lhs)==mytie(rhs);
  }
  friend bool operator!=( cross_iterator_t lhs, cross_iterator_t rhs ) {
    return !(lhs==rhs);
  }
};
template<class Lhs, class Rhs>
auto cross_iterator( range_t<Lhs> a, range_t<Rhs> b )
-> cross_iterator_t<Lhs, Rhs>
{
  return {a,b};
}

From this you can take std::vector<A>, B and do:

template<class A, class B>
auto cross_one_element( A& range_a, B& b_element ) {
  auto a = range( std::begin(range_a), std::end(range_a) );
  auto b = range( &b_element, (&b_element) +1 );
  auto s = cross_iterator(a, b);
  decltype(s) f{};
  return cross_iterator(s, f);
}

So that solves one of your problems. The above needs to be fixed to support true input iterator featurs, not just the above pseudo-iterator that works with for(:).

Then you have to write code that takes a vector of X and transorms it into a range of f(X) for some function f.

Then you have to write code that takes a range of ranges, and flattens it into a range.

Each of these steps is no harder than above.

There are libraries that do this for you. boost has some, Rangesv3 has some, as do a pile of other range-manipulation libraries.

Boost even lets you write an iterator by specifying what to do on * and on next and on ==. Getting what to do when one of your sub-vectors is empty remains tricky, so using more generic algorithms in this case is probably wise.

The code above is not tested, and is C++14. C++11 versions are merely more verbose.

like image 99
Yakk - Adam Nevraumont Avatar answered Dec 23 '25 08:12

Yakk - Adam Nevraumont



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!