Search for a string in a 100 GB file [closed]

Question

I have a 100GB text file. The data in that file is in this format:

email||username||password_hash

I am testing on a 6GB file which I made separately by splitting the bigger file.

I am running grep to match the lines and output them.

I used grep. It is taking around 1 minute 22 seconds
I used other options with grep, like, LC_ALL=C and -F, but the time is reduced to 1 minute 15 seconds, which is still not good for a 6GB file.
Then I used ripgrep, it is taking 27 seconds on my machine, still not good.
Then I used ripgrep with -F option, it is taking 14 seconds, still not good.
I tried ag also (the silver searcher), but I found that it won't work for files bigger than 2 GB.

I need your help which command line tool (or language) to achieve better results, or some way I can take advantage of the format of data to search by column. Like if I am searching by username, then instead of matching the whole line, I search only on the second column. I tried that using awk, but it is still slower.

hek2mgl · Accepted Answer

If you have to do this just once: Use grep and wait until it finishes.

If searching for a strings in 600GB csv files is part of your regular process then you'll have to change the process. Options are: use a database instead of a text file, use map/reduce and spread the load across multiple machines and cores (hadoop), ...

Search for a string in a 100 GB file [closed]

Tags:

python

linux

bash

awk

perl

Bhawan

1 Answers

hek2mgl

Recent Activity

Donate For Us

Search for a string in a 100 GB file [closed]

Tags:

python

linux

bash

awk

perl

Bhawan

1 Answers

hek2mgl

Related questions

Recent Activity

Donate For Us