I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to:
What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable.
For reference, here is (a quick hack) how I might do it in Python:
re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
and re_key_values.match(line)
def parse(lines):
interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
return dict(splitted_values)
A direct translation of your Python would be
my $re_is_comment_line = qr/^\s*#/;
my $re_key_values = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter = qr/\s*,\s*/;
my $is_interesting_line= sub {
my $_ = shift;
length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};
sub parse {
my @lines = @_;
my @interesting_lines = grep $is_interesting_line->($_), @lines;
my @key_values = map [/$re_key_values/], @interesting_lines;
my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } @key_values;
return %splitted_values;
}
Differences are:
ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.In the next example, the following will be added:
split. It takes an optional third argument that limits the number of resulting fragments.map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL)..
sub parse {
my %splitted_values =
map { $_->[0], [split /\s*,\s*/, $_->[1]] }
map {[split /\s*=\s*/, $_, 2]}
grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
@_;
return \%splitted_values; # returning a reference improves efficiency
}
But I think a more elegant solution here is to use a traditional loop:
sub parse {
my %splitted_values;
LINE: for (@_) {
next LINE if !length or /^\s*#/;
s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
my ($key, $vals) = split /\s*=\s*/, $_, 2;
defined $vals or next LINE; # check if $vals was assigned
@{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
}
return \%splitted_values
}
If we decide to pass a filehandle instead, the loop would be replaced with
my $fh = shift;
LOOP: while (<$fh>) {
chomp;
...;
}
which would use an actual iterator.
You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(@args) syntax.
use Carp; # Error handling for writing APIs
sub parse {
my $args = shift;
my $interesting = $args->{interesting} or croak qq("interesting" callback required);
my $kv_splitter = $args->{kv_splitter} or croak qq("kv_splitter" callback required);
my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default
my %splitted_values;
LINE: for (@_) {
next LINE unless $interesting->($_);
s/\A\s*|\s*\z//g;
my ($key, $vals) = $kv_splitter->($_);
defined $vals or next LINE;
$splitted_values{$key} = $val_transform->($vals);
}
return \%splitted_values;
}
This could then be called like
my $data = parse {
interesting => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
kv_splitter => sub { split /\s*=\s*/, $_[0], 2 },
val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, @lines;
I think the most modern approach consists in taking advantage of the CPAN modules. In your example, Config::Properties may helps:
use strict;
use warnings;
use Config::Properties;
my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With