Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for a string in a 100 GB file [closed]

I have a 100GB text file. The data in that file is in this format:

email||username||password_hash

I am testing on a 6GB file which I made separately by splitting the bigger file.

I am running grep to match the lines and output them.

  1. I used grep. It is taking around 1 minute 22 seconds

  2. I used other options with grep, like, LC_ALL=C and -F, but the time is reduced to 1 minute 15 seconds, which is still not good for a 6GB file.

  3. Then I used ripgrep, it is taking 27 seconds on my machine, still not good.

  4. Then I used ripgrep with -F option, it is taking 14 seconds, still not good.

  5. I tried ag also (the silver searcher), but I found that it won't work for files bigger than 2 GB.

I need your help which command line tool (or language) to achieve better results, or some way I can take advantage of the format of data to search by column. Like if I am searching by username, then instead of matching the whole line, I search only on the second column. I tried that using awk, but it is still slower.

like image 948
Bhawan Avatar asked Jan 16 '26 19:01

Bhawan


1 Answers

If you have to do this just once: Use grep and wait until it finishes.

If searching for a strings in 600GB csv files is part of your regular process then you'll have to change the process. Options are: use a database instead of a text file, use map/reduce and spread the load across multiple machines and cores (hadoop), ...

like image 58
hek2mgl Avatar answered Jan 19 '26 09:01

hek2mgl



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!