Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Perl, how do I process multiple lines

Tags:

perl

Say, I have a file that has the following lines with a "TIMESTAMP" "NAME":

10:00:00 Bob
11:00:00 Tom
11:00:20 Fred
11:00:40 George
12:00:00 Bill

I want to read this file, group the names that occur in each hour on a single line, then write the revised lines to a file, for example.

10:00:00 Bob
11:00:00 Tom, Fred, George
12:00:00 Bill

like image 596
Alan Thomas Avatar asked Dec 06 '25 00:12

Alan Thomas


2 Answers

Given that, per comments on the original question, all entries for the same hour are contiguous and the file is too large to fit into memory, I would dispense with the hash entirely - if the raw file is too big to fit in memory, then a hash containing all of its data will likely also be too large. (Yes, it's compressing the data a bit, but the hash itself adds substantial overhead.)

My solution, then:

#!/usr/bin/env perl

use strict;
use warnings;

my $current_hour = -1;
my @names;

while (my $line = <DATA>) {
  my ($hour, $name) = $line =~ /(\d{2}):\d{2}:\d{2} (.*)/;
  next unless $hour;

  if ($hour != $current_hour) {
    print_hour($current_hour, @names);
    @names = ();
    $current_hour = $hour;
  }

  push @names, $name;
}

print_hour($current_hour, @names);

exit;

sub print_hour {
  my ($hour, @names) = @_;
  return unless @names;

  print $hour, ':00:00 ', (join ', ', @names), "\n";
}

__DATA__
10:00:00 Bob
11:00:00 Tom
11:00:20 Fred
11:00:40 George
12:00:00 Bill
like image 56
Dave Sherohman Avatar answered Dec 07 '25 13:12

Dave Sherohman


In grouped_by_hour below, for each line from the filehandle, if it has a timestamp and a name, we push that name onto an array associated with the timestamp's hour, using sprintf to normalize the hour in case one timestamp is 03:04:05 and another is 3:9:18.

sub grouped_by_hour {
  my($fh) = @_;

  local $_;
  my %hour_names;

  while (<$fh>) {
    push @{ $hour_names{sprintf "%02d", $1} } => $2
      if /^(\d+):\d+:\d+\s+(.+?)\s*$/;
  }

  wantarray ? %hour_names : \%hour_names;
}

The normalized hours also allow us to sort with the default comparison. The code below places the input in the special DATA filehandle by having it after the __DATA__ token, but in real code, you might call grouped_by_hour $fh.

my %hour_names = grouped_by_hour \*DATA;
foreach my $hour (sort keys %hour_names) {
  print "$hour:00:00 ", join(", " => @{ $hour_names{$hour} }), "\n";
}

__DATA__
10:00:00 Bob
11:00:00 Tom
11:00:20 Fred
11:00:40 George
12:00:00 Bill

Output:

10:00:00 Bob
11:00:00 Tom, Fred, George
12:00:00 Bill
like image 23
Greg Bacon Avatar answered Dec 07 '25 12:12

Greg Bacon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!