I´m loading csv data from files into a datatable for processing.
The problem is, that I want to process several files and my tests with the datatable shows me huge memory consumption I tested with a 37MB csv file and the memory growed up to 240MB, which is way to much IMHO. I read, that there is overhead in the datatable and I could live with about 70MB in size , but not 240MB, which means it is six times the original size. I read here, that datatables need more memory than POCOs, but that the difference is way too much.
I put on a memory profiler and looked, if I have memory leaks and where the memory is. I found, that the datatablecolumns have between 6MB and 19MB filled with strings and the datatable had about 20 columns. Are the values stored in the columns? Why is so much memory taken, what can I do to reduce memory consumption. With this memory consumption datattables seem to be unusable.
Had somebody else such problems with datatables, or I´m doing something wrong?
PS: I tried a 70MB file and the datatable growed up to 500MB!
OK here is a small testcase: The 37MB csv-file (21 columns) let the memory grow up to 179MB.
    private static DataTable ReadCsv()
    {
        DataTable table = new DataTable();
        table.BeginLoadData();
        using (var reader = new StreamReader(File.OpenRead(@"C:\Develop\Tests\csv-Data\testdaten\test.csv")))
        {               
            int y = 0;
            int columnsCount = 0;
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
                var values = line.Split(',');
                if (y == 0)
                {
                    columnsCount = values.Count();
                    // create columns
                    for (int x = 0; x < columnsCount; x++)
                    {
                        table.Columns.Add(new DataColumn(values[x], typeof(string)));
                    }
                }
                else
                {
                    if (values.Length == columnsCount)
                    {
                        // add the data
                        table.Rows.Add(values);
                    }
                }
                y++;
            }
            table.EndLoadData();
            table.AcceptChanges();
        }
        return table;
    }
DataSet and its children DataTable, DataRow, etc. make up an in-memory relational database. There is a lot of overhead involved (though it does make [some] things very convenient.
If memory is an issue,
IList<T> to hold themDataTable:
Are you sure you need an in-memory representation of your CSV files? Could you access them via an IDataReader like Sebastien Lorion's Fast CSV Reader?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With