Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to read a text file by chunks in C#

Tags:

c#

.net

I need to read a big text file and search for a string in each line, each line separated by linebreak and I need to minimize I/O and RAM

My idea is to separate the file into chunks, so I have two approachs:

1) Split the FileStream with something like this but then I risk that text lines will be cut in half and that can make things complex:

 using (FileStream fsSource = new FileStream("InputFiles\\1.txt", FileMode.Open, FileAccess.Read))
            {
                // Read the source file into a byte array.
                int numBytesToRead = 1024; // Your amount to read at a time
                byte[] bytes = new byte[numBytesToRead];

                int numBytesRead = 0;
                while (numBytesToRead > 0)
                {
                    // Read may return anything from 0 to numBytesToRead.
                    int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);

                    // Break when the end of the file is reached.
                    if (n == 0)
                        break;

                    //done something with the lines here.
                }
            }

2) Create an extension method to split the list of lines into smaller lists of lines and then search the word in each line but I am unsure about how this method can affect I/O and RAM!.

public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(this IEnumerable<TValue> values, int chunkSize)
        {
            using (var enumerator = values.GetEnumerator())
            {
                while (enumerator.MoveNext())
                {
                    yield return GetChunk(enumerator, chunkSize).ToList();
                }
            }
        }

        private static IEnumerable<T> GetChunk<T>(IEnumerator<T> enumerator, int chunkSize)
        {
            do
            {
                yield return enumerator.Current;
            } while (--chunkSize > 0 && enumerator.MoveNext());
        }

Any thoughts or other methods I can use?

Thanks in advance.

like image 243
Metalex Avatar asked Oct 20 '25 14:10

Metalex


1 Answers

I think you are overcomplicating things. The NET Framework has a lot of methods to choose from when you want to read a text file.

If you need to process a big text file nothing better than using the method File.ReadLines because it doesn't load all the file in memory but allows you to work line by line

As you can read from the MSDN docs

When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned;

foreach(string line in File.ReadLines(@"InputFiles\1.txt"))
{
    // Process your line here....
}
like image 113
Steve Avatar answered Oct 23 '25 05:10

Steve



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!