I have an adapter, whose goal is to provide forward iterator for pair values pair<FeatureVector, Label>. However in my internal representation I store data as vector<pair<vector<strings>, Label>>.
So during iterations, I need to flatten it and convert every single string, which is short sentence like "oil drops massively today", to FeatureVector
In raw variant I have something like:
{
{"Oil drops massively","OPEC surge oil produciton","Brent price goes up" -> "OIL_LABEL"},
{"France consume more vine", "vine production in Italy drops" -> "VINE_LABEL"}
}
and I need to convert it to:
{
vectorize("Oil drops massively") -> "OIL_LABEL",
vectorize("OPEC surge oil produciton") -> "OIL_LABEL", ... ,
vectorize("vine production in Italy drops") -> "VINE_LABEL"
}
vectorize() -> it's a conversion from sentence to sparse vector like this "Oil drops on NYSE" -> {0,1,0..0,1,0..0,1}
The simpliest way will be create new vector and initialize it with all data and than use it's iterators, but this is pretty resource havy operation, so ideally I want this kind of conversion to be done over each iteration. What is the most elegant way for such kind of conversion?
This is a simplified version of data structure for storing text corpus. Iterators later need to be used in classifier initialization, which require 2 iterators: begin and end which is logically similar to the same as in vector.
A simple range type:
template<class It>
struct range_t {
It b{},e{};
It begin() const {return b;}
It end() const {return e;}
bool empty() const {return begin()==end();}
friend bool operator==(range_t lhs, range_t rhs){
if (lhs.empty() && rhs.empty()) return true;
return lhs.begin() == rhs.begin() && lhs.end() == rhs.end();
}
friend bool operator!=(range_t lhs, range_t rhs){
return !(lhs==rhs);
}
range_t without_front( std::size_t N = 1 ) const {
return { std::next(begin(), N), end() };
}
range_t without_back( std::size_t N = 1 ) const {
return { begin(), std::prev(end(),N) };
}
decltype(auto) front() const {
return *begin();
}
decltype(auto) back() const {
return *std::prev(end());
}
};
template<class It>
range_t<It> range( It b, It e ) {
return {b,e};
}
Here is a non-compliant pseudo-iterator that does the cross product of two ranes:
template<class ItA, class ItB>
struct cross_iterator_t {
range_t<ItA> cur_a;
range_t<ItB> orig_b;
range_t<ItB> cur_b;
cross_iterator_t( range_t<ItA> a, range_t<ItB> b ):
cur_a(a), orig_b(b), cur_b(b)
{}
bool empty() const { return cur_a.empty() || cur_b.empty(); }
void operator++(){
cur_b = cur_b.without_front();
if (cur_b.empty()) {
cur_a = cur_a.without_front();
if (cur_a.empty()) return;
cur_b = orig_b;
}
}
auto operator*()const {
return std::make_pair( cur_a.front(), cur_b.front() );
}
friend bool operator==( cross_iterator_t lhs, cross_iterator_t rhs ) {
if (lhs.empty() && rhs.empty()) return true;
auto mytie=[](auto&& self){
return std::tie(self.cur_a, self.cur_b);
};
return mytie(lhs)==mytie(rhs);
}
friend bool operator!=( cross_iterator_t lhs, cross_iterator_t rhs ) {
return !(lhs==rhs);
}
};
template<class Lhs, class Rhs>
auto cross_iterator( range_t<Lhs> a, range_t<Rhs> b )
-> cross_iterator_t<Lhs, Rhs>
{
return {a,b};
}
From this you can take std::vector<A>, B and do:
template<class A, class B>
auto cross_one_element( A& range_a, B& b_element ) {
auto a = range( std::begin(range_a), std::end(range_a) );
auto b = range( &b_element, (&b_element) +1 );
auto s = cross_iterator(a, b);
decltype(s) f{};
return cross_iterator(s, f);
}
So that solves one of your problems. The above needs to be fixed to support true input iterator featurs, not just the above pseudo-iterator that works with for(:).
Then you have to write code that takes a vector of X and transorms it into a range of f(X) for some function f.
Then you have to write code that takes a range of ranges, and flattens it into a range.
Each of these steps is no harder than above.
There are libraries that do this for you. boost has some, Rangesv3 has some, as do a pile of other range-manipulation libraries.
Boost even lets you write an iterator by specifying what to do on * and on next and on ==. Getting what to do when one of your sub-vectors is empty remains tricky, so using more generic algorithms in this case is probably wise.
The code above is not tested, and is C++14. C++11 versions are merely more verbose.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With