In perl 5, you can emulate wc -l using oneliner:
perl -lnE 'END {say $.}' test.txt
How to implement this functionality on Raku
If you try to implement this:
raku -e 'say "test.txt".IO.open.lines.elems'
it turns out to be slow and uses a lot of memory
Information for reproduce:
$ wget http://eforexcel.com/wp/wp-content/uploads/2017/07/1500000%20Sales%20Records.zip
$ unzip "1500000 Sales Records.zip"
$ mv "1500000 Sales Records.csv" part.txt
$ for i in `seq 1 10`; do cat part.txt >> test.txt ; done
$ du -sh test.txt
1.8G test.txt
$ time wc -l test.txt
15000000 test.txt
real 0m0,350s
user 0m0,143s
sys 0m0,205s
$ time perl -lnE 'END { say $. }' test.txt
15000001
real 0m1,981s
user 0m1,719s
sys 0m0,256s
$ time raku -e 'say "test.txt".IO.open.lines.elems'
15000001
real 2m51,852s
user 0m25,129s
sys 0m6,378s
# Using swap (maximum uses 2.2G swap):
# Before `raku -e ''`
$ free -m
total used free shared buff/cache available
Mem: 15009 1695 12604 107 708 12917
Swap: 7583 0 7583
# After `raku -e ''`
$ free -m
total used free shared buff/cache available
Mem: 15009 752 13923 72 332 13899
Swap: 7583 779 6804
# Swap not used
$ time raku -ne '++$ andthen END .say' test.txt
15000001
real 1m44,906s
user 2m14,165s
sys 0m0,653s
$ raku -v
This is Rakudo version 2019.11 built on MoarVM version 2019.11
implementing Perl 6.d.
One option that's still likely to be pretty slow compared to perl but worth comparing:
raku -ne '++$ andthen END .say' test.txt
The l command line option is redundant.
$ is an anonymous state scalar.
andthen tests that its lhs is defined, and if so, sets that value as the topic ($_) and then evaluates its rhs.
END is similar to perl's END. Note that it returns Nil to the andthen but that doesn't matter here because we're using the END's statement for its side-effect.
Several things will impact this code's speed. Some things I can think of:
Compiler startup overhead. Ignoring any modules being used, the raku compiler Rakudo has a startup overhead of about a tenth of a second on typical hardware compared to a fairly negligible one for perl.
The notion of a "line". In perl, the default notion of line processing is reading a series of bytes, some of which represent a line end. In raku, the default notion of line processing is reading a UTF-8 string, some of which represents line ends. Thus perl only incurs the reading overhead of an ASCII (or Extended ASCII) decoder whereas raku incurs the reading overhead of a UTF-8 decoder.
Compiler optimizations. perl is generally optimized to the max. It wouldn't surprise me if perl -lnE 'END {say $.}' test.txt takes advantage of some clever optimizations. In contrast, work on Rakudo optimization is still in its early days relatively speaking.
The only things I think anyone can do about the first and last of the three points I've mentioned above are to wait N years and/or contribute to the compiler's improvement.
There will be a way to work around raku's UTF-8-by-default. Perhaps something like the following is already doable and significantly faster than raku's default, at least ignoring the overhead of using a module called foo:
raku -Mfoo -ne '++$ andthen END .say' test.txt
where module foo switches the default encoding for file I/O to ASCII or whatever from the available encodings.
I haven't checked that this is actually doable in current Rakudo but would be surprised if were not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With