I am trying to count all the lines in a txt file, I am using the StreamReader:
public int countLines(string path)
{
    var watch = System.Diagnostics.Stopwatch.StartNew();
    int nlines=0;
    string line;
    StreamReader file = new StreamReader(path);
    while ((line = file.ReadLine()) != null)
    {
        nlines++;
    }
    watch.Stop();
    var elapsedMs = watch.ElapsedMilliseconds;
    Console.Write(elapsedMs)
    // elapsedMs = 3520  --- Tested with a 1.2 Mill txt
    return nlines;
}
Is there a more efficient way to count the number of lines?
The command “wc” basically means “word count” and with different optional parameters one can use it to count the number of lines, words, and characters in a text file. Using wc with no options will get you the counts of bytes, lines, and words (-c, -l and -w option).
Use readlines() to get Line Count This is the most straightforward way to count the number of lines in a text file in Python. The readlines() method reads all lines from a file and stores it in a list. Next, use the len() function to find the length of the list which is nothing but total lines present in a file.
If you are in *Nix system, you can call the command wc -l that gives the number of lines in file.
The simplest way to get the number of lines in a text file is to combine the File. ReadLines method with System. Linq. Enumerable.
You already have the appropriate solution but you can simplify all your code to:
var lineCount = File.ReadLines(@"C:\MyHugeFile.txt").Count();
I am not sure how dreamlax achieved his benchmark results but here is something so that anyone can reproduce on their machine; you can just copy-paste into LINQPad.
First let us prepare our input file:
var filePath = @"c:\MyHugeFile.txt";
for (int counter = 0; counter < 5; counter++)
{
    var lines = new string[30000000];
    for (int i = 0; i < lines.Length; i++)
    {
        lines[i] = $"This is a line with a value of: {i}";
    }
    File.AppendAllLines(filePath, lines);
}
This should produce a 150 million lines file which is roughly 6 GB.
Now let us run each method:
void Main()
{
    var filePath = @"c:\MyHugeFile.txt";
    // Make sure you clear windows cache!
    UsingFileStream(filePath);
    // Make sure you clear windows cache!
    UsingStreamReaderLinq(filePath);
    // Make sure you clear windows cache!
    UsingStreamReader(filePath);
}
private void UsingFileStream(string path)
{
    var sw = Stopwatch.StartNew();
    using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    {
        long lineCount = 0;
        byte[] buffer = new byte[1024 * 1024];
        int bytesRead;
        do
        {
            bytesRead = fs.Read(buffer, 0, buffer.Length);
            for (int i = 0; i < bytesRead; i++)
                if (buffer[i] == '\n')
                    lineCount++;
        }
        while (bytesRead > 0);       
        Console.WriteLine("[FileStream] - Read: {0:n0} in {1}", lineCount, sw.Elapsed);
    }
}
private void UsingStreamReaderLinq(string path)
{
    var sw = Stopwatch.StartNew();
    var lineCount = File.ReadLines(path).Count();
    Console.WriteLine("[StreamReader+LINQ] - Read: {0:n0} in {1}", lineCount, sw.Elapsed);
}
private void UsingStreamReader(string path)
{
    var sw = Stopwatch.StartNew();
    long lineCount = 0;
    string line;
    using (var file = new StreamReader(path))
    {
        while ((line = file.ReadLine()) != null) { lineCount++; }
        Console.WriteLine("[StreamReader] - Read: {0:n0} in {1}", lineCount, sw.Elapsed);
    }
}
Which results in:
[FileStream] - Read: 150,000,000 in 00:00:37.3397443
[StreamReader+LINQ] - Read: 150,000,000 in 00:00:33.8842190
[StreamReader] - Read: 150,000,000 in 00:00:34.2102178
Running with optimization ON results in:
[FileStream] - Read: 150,000,000 in 00:00:18.1636374
[StreamReader+LINQ] - Read: 150,000,000 in 00:00:33.3173354
[StreamReader] - Read: 150,000,000 in 00:00:32.3530890
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With