I want to find a delimiter being used to separate the columns in csv or text files.
I am using TextFieldParser class to read those files.
Below is my code,
String path = @"c:\abc.csv";
DataTable dt = new DataTable();
if (File.Exists(path))
{
using (Microsoft.VisualBasic.FileIO.TextFieldParser parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(path))
{
parser.TextFieldType = FieldType.Delimited;
if (path.Contains(".txt"))
{
parser.SetDelimiters("|");
}
else
{
parser.SetDelimiters(",");
}
parser.HasFieldsEnclosedInQuotes = true;
bool firstLine = true;
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
if (firstLine)
{
foreach (var val in fields)
{
dt.Columns.Add(val);
}
firstLine = false;
continue;
}
dt.Rows.Add(fields);
}
}
lblCount.Text = "Count of total rows in the file: " + dt.Rows.Count.ToString();
dgvTextFieldParser1.DataSource = dt;
Instead of passing the delimiters manually based on the file type, I want to read the delimiter from the file and then pass it.
How can I do that?
Very simple guessing approach using LINQ:
static class CsvSeperatorDetector
{
private static readonly char[] SeparatorChars = {';', '|', '\t', ','};
public static char DetectSeparator(string csvFilePath)
{
string[] lines = File.ReadAllLines(csvFilePath);
return DetectSeparator(lines);
}
public static char DetectSeparator(string[] lines)
{
var q = SeparatorChars.Select(sep => new
{Separator = sep, Found = lines.GroupBy(line => line.Count(ch => ch == sep))})
.OrderByDescending(res => res.Found.Count(grp => grp.Key > 0))
.ThenBy(res => res.Found.Count())
.First();
return q.Separator;
}
}
What this does is it reads the file line by line (note that CSV files may include line breaks), then checks for each potential separator how often it occurs in each line. Then we check which separator occurs on the most lines, and of those which occur on the same number of lines, we take the one with the most even distribution (e.g. 5 occurences on every line are ranked higher than one that occurs once in one line and 10 times in another line). Of course you might have to tweak this for your own purposes, add error handling, fallback logic and so forth. I'm sure it's not perfect, but it's good enough for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With