Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find a delimiter of csv or text files in c#

Tags:

c#

csv

I want to find a delimiter being used to separate the columns in csv or text files.

I am using TextFieldParser class to read those files.

Below is my code,

String path = @"c:\abc.csv";
DataTable dt = new DataTable();
if (File.Exists(path))
{
    using (Microsoft.VisualBasic.FileIO.TextFieldParser parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(path))
    {
        parser.TextFieldType = FieldType.Delimited;
        if (path.Contains(".txt"))
        {       
            parser.SetDelimiters("|");
        }
        else
        {
            parser.SetDelimiters(",");
        }
        parser.HasFieldsEnclosedInQuotes = true;
        bool firstLine = true;
        while (!parser.EndOfData)
        {
            string[] fields = parser.ReadFields();
            if (firstLine)
            {
                  foreach (var val in fields)
                  {
                      dt.Columns.Add(val);
                  }
                  firstLine = false;
                  continue;
             }
             dt.Rows.Add(fields);
          }
     }
 lblCount.Text = "Count of total rows in the file: " + dt.Rows.Count.ToString();
 dgvTextFieldParser1.DataSource = dt;

Instead of passing the delimiters manually based on the file type, I want to read the delimiter from the file and then pass it.

How can I do that?

like image 418
AMeh Avatar asked Sep 19 '25 19:09

AMeh


1 Answers

Very simple guessing approach using LINQ:

static class CsvSeperatorDetector
{
    private static readonly char[] SeparatorChars = {';', '|', '\t', ','};

    public static char DetectSeparator(string csvFilePath)
    {
        string[] lines = File.ReadAllLines(csvFilePath);
        return DetectSeparator(lines);
    }

    public static char DetectSeparator(string[] lines)
    {
        var q = SeparatorChars.Select(sep => new
                {Separator = sep, Found = lines.GroupBy(line => line.Count(ch => ch == sep))})
            .OrderByDescending(res => res.Found.Count(grp => grp.Key > 0))
            .ThenBy(res => res.Found.Count())
            .First();

        return q.Separator;
    }
}

What this does is it reads the file line by line (note that CSV files may include line breaks), then checks for each potential separator how often it occurs in each line. Then we check which separator occurs on the most lines, and of those which occur on the same number of lines, we take the one with the most even distribution (e.g. 5 occurences on every line are ranked higher than one that occurs once in one line and 10 times in another line). Of course you might have to tweak this for your own purposes, add error handling, fallback logic and so forth. I'm sure it's not perfect, but it's good enough for me.

like image 85
TravelingFox Avatar answered Sep 22 '25 08:09

TravelingFox