Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ to compare 2 lists of strings

In Python, set is pretty handy for comparing 2 lists of strings (see this link). I was wondering if there's a good solution for C++ in terms of performance. As each list has over 1 million strings in it.

It's case-sensitive matching.

like image 534
Stan Avatar asked Jan 23 '26 20:01

Stan


1 Answers

The data types std::set<> (usually implemented as a balanced tree) and std::unordered_set<> (from C++11, implemented as a hash) are available. There is also a convenience algorithm called std::set_intersection that computes the actual intersection.

Here is an example.

#include <iostream>
#include <vector>
#include <string>
#include <set>             // for std::set
#include <algorithm>       // for std::set_intersection

int main()
{
  std::set<std::string> s1 { "red", "green", "blue" };
  std::set<std::string> s2 { "black", "blue", "white", "green" };

  /* Collecting the results in a vector. The vector may grow quite
     large -- it may be more efficient to print the elements directly. */     
  std::vector<std::string> s_both {};

  std::set_intersection(s1.begin(),s1.end(),
                        s2.begin(),s2.end(),
                        std::back_inserter(s_both));

  /* Printing the elements collected by the vector, just to show that
     the result is correct. */
  for (const std::string &s : s_both)
    std::cout << s << ' ';
  std::cout << std::endl;

  return 0;
}

Note. If you want to use std::unordered_set<>, the std::set_intersection cannot be used like this, because it expects the input sets to be ordered. You'd have to use the usual technique of a for-loop iterating over the smaller set and finding the elements in the larger one to determine the intersection. Nevertheless, for a large number of elements (especially, strings), the hash-based std::unordered_set<> may be faster. There are also STL-compatible implementations such as the one in Boost (boost::unordered_set) and the one created by Google (sparse_hash_set and dense_hash_set). For various other implementations and benchmarks (including one for strings), see here.

like image 100
jogojapan Avatar answered Jan 26 '26 10:01

jogojapan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!