string str1, str2;
vector<string> vec;
ifstream infile;
infile.open("myfile.txt");
while (! infile.eof() )
{
getline(infile,str1);
istringstream is;
is >> str1;
while (is >> str2)
{
vec.push_back(str2);
}
}
What the code does is read the string from the file and stores it into a vector.
Performance needs to be given priority. How can i optimize this code, to make the reading performance faster?
As others have already pointed out (see for example herohuyongtao's answer), the loop condition and how you put str1 into the istringstream must be fixed.
However, there is an important issue here that everybody has missed so far: you don't need the istringstream at all!
vec.reserve(the_number_of_words_you_exptect_at_least);
while (infile >> str1) {
vec.push_back(str1);
}
It gets rid of the inner loop which you didn't need in the first place, and doesn't create an istringstream in each iteration.
If you need to parse further each line and you do need an istringstream, create it outside of the loop and set its string buffer via istringstream::str(const string& s).
I can easily imagine your loops being very slow: Heap allocation on Windows is outrageously slow (compared to Linux); I got bitten once.
Andrei Alexandrescu presents (in some sense) a similar example in his talk Writing Quick Code in C++, Quickly. The surprising thing is that doing unnecessary heap allocations in a tight loop like the one above can be slower than the actual file IO. I was surprised to see that.
You didn't tag your question as C++11 but here is what I would do in C++11.
while (infile >> str1) {
vec.emplace_back(std::move(str1));
}
This move constructs the string at the back of the vector, without copying in. We can do it because we don't need the contents of str1 after we have put it into the vector. In other words, there is no need to copy it into a brand new string at the back of the vector, it is sufficient to just move its contents there. The first loop with the vec.push_back(str1); may potentially copy the contents of str1 which is really unnecessary.
The string implementation in gcc 4.7.2 is currently copy on write, so the two loops have identical performance; it doesn't matter which one you use. For now.
Unfortunately, copy on write strings are now forbidden by the standard. I don't know when the gcc developers are going to change the implementation. If the implementation changes, it may make a difference in performance whether you move (emplace_back(std::move(s))) or you copy (push_back(s)).
If C++98 compatibility is important for you, then go with push_back(). Even if the worst thing happens in the future and your string is copied (it isn't copied now), that copy can be turned into a memmove() / memcpy() which is blazing fast, most likely faster than reading the contents of the file from the hard disk so file IO will most likely remain the bottleneck.
Before any optimization, you need to change
while (! infile.eof() ) // problem 1
{
getline(infile,str1);
istringstream is;
is >> str1; // problem 2
while (is >> str2){
vec.push_back(str2);
}
}
to
while ( getline(infile,str1) ) // 1. don't use eof() in a while-condition
{
istringstream is(str1); // 2. put str1 to istringstream
while (is >> str2){
vec.push_back(str2);
}
}
to make it work as you expected.
P.S. For the optimization part, you don't need to think too much on it unless it becomes a bottleneck. Premature optimization is the root of all evil. However, if you do want to speed it up, check out @Ali's answer for further info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With