I have a file which contains about 70,000 records which is structured roughly like this:
01499 1000642 4520101000900000
...more numbers...
104000900169
+Fieldname1
-Content
+Fieldname2
-Content
-Content
-Content
+Fieldname3
-Content
-Content
+Fieldname4
-Content
+Fieldname5
-Content
-Content
-Content
-Content
-Content
-Content
01473 1000642 4520101000900000
...more numbers...
Every record thus starts with a column of numbers and ends with a blank line. Before this blank line most records have a +Fieldname5 and one or more -Content lines.**
What I would like to do is to merge all multi-line entries into one line while replacing the leading minus-character by a space except those pertaining to the last field (i.e. Fieldname5 in this case).
It should look like this:
01499 1000642 4520101000900000
...more numbers...
104000900169
+Fieldname1
-Content
+Fieldname2
-Content Content Content
+Fieldname3
-Content Content
+Fieldname4
-Content
+Fieldname5
-Content
-Content
-Content
-Content
-Content
-Content
01473 1000642 4520101000900000
...more numbers...
What I have now is this (adapted from this answer):
use strict;
use warnings;
our $input = "export.txt";
our $output = "export2.txt";
open our $in, "<$input" or die "$!\n";
open our $out, ">$output" or die "$!\n";
my $this_line = "";
my $new = "";
while(<$in>) {
my $last_line = $this_line;
$this_line = $_;
# If both $last_line and $this_line start with a "-" do the following:
if ($last_line =~ /^-.+/ && $this_line =~ /^-.+/) {
# Remove \n from $last_line
chomp $last_line;
# Remove leading "-" from $this_line
$this_line =~ s/^-//;
# Join both lines and print them to the file
$new = join(' ', $last_line, $this_line);
print $out $new;
} else {
print $out $last_line;
}
}
close ($in);
close ($out);
But there are two problems with this:
It correctly prints out the joined line, but then still prints out the second line, e.g.,
+Fieldname2 -Content Content Content -Content
So how can I make the script only output the joined line?
How can I do the following?
\n- by , except if it belongs to a given fieldname (e.g., Fieldname5).It worked! I just added another conditional at the beginning:
use strict;
use warnings;
our $input = "export.txt";
our $output = "export2.txt";
open our $in, "<$input" or die "Kann '$input' nicht finden: $!\n";
open our $out, ">$output" or die "Kann '$output' nicht erstellen: $!\n";
my $insideMultiline = 0;
my $multilineBuffer = "";
my $exception = 0; # Variable indicating whether the current
# multiline-block is a "special" or not
LINE:
while (<$in>) {
if (/^\+Fieldname5/) { # If line starts with +Fieldname5,
# set $exception to "1"
$exception = 1;
}
elsif (/^\s/) { # If line starts with a space,
# set $exception to "0"
$exception = "0";
}
if ($exception == 0 && /^-/) { # If $exception is "0" AND
# the line starts with "-",
# do the following
chomp;
if ($insideMultiline) {
s/^-/ /;
$multilineBuffer .= $_;
}
else {
$insideMultiline = 1;
$multilineBuffer = $_;
}
next LINE;
}
else {
if ($insideMultiline) {
print $out "$multilineBuffer\n";
$insideMultiline = 0;
$multilineBuffer = "";
}
print $out $_;
}
}
close ($in);
close ($out);
Assuming the only lines which begin with "-" are these multi-line sections, you could do this...
# Open $in and $out as in your original code...
my $insideMultiline = 0;
my $multilineBuffer = "";
LINE:
while (<$in>) {
if (/^-/) {
chomp;
if ($insideMultiline) {
s/^-/ /;
$multilineBuffer .= $_;
}
else {
$insideMultiline = 1;
$multilineBuffer = $_;
}
next LINE;
}
else {
if ($insideMultiline) {
print $out "$multilineBuffer\n";
$insideMultiline = 0;
$multilineBuffer = "";
}
print $out $_;
}
}
As to the embedded subquestion ("except those pertaining to the last field"), I'd need more detail on the file format to be able to do that. It looks like a blank line separates the sets of fields and contents from one another, but that's not 100% clear in the description. The code above should handle the requirements you laid out at the bottom, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With