Consider the following character strings:
"bla ; bla"; bla
"bla "";"" bla"; bla
"bla ";" bla"; bla
I'm trying to match any ; that is not in a quoted field (e.g. "bla ; bla") or in between 2 quotes.
In other words, I would like to match the second ; in the first 2 strings and all ; in the last string.
Here are the 2 regex I've been trying but I can't manage to make one that works in all cases.
^(['"])(?:(?!\1).)*\1(?=;)(*SKIP)(*F)|;
^(['"])(?:(?!(?!\1)\1).)*\1(?=;)(*SKIP)(*F)|;
Any idea?
EDIT
I omitted several important details in my initial question. The example lines above are from .csv files. I'm trying to extract all file separators ; in lines from different files. The problem I have is to distinguish between a quoted ; inside a quoted field (line 2) and two quoted fields separated by ; (line 3). A quoted field is always followed by ; in my case.
Use an actual CSV parser (Well, Semicolon-SV) like Text::CSV_XS instead of trying to hack up something with regular expressions:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ binary => 1, sep_char => ";"});
while (my $row = $csv->getline(\*DATA)) {
say $row->[0];
}
__DATA__
"bla ; bla"; bla
"bla "";"" bla"; bla
"bla ";" bla"; bla
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With