Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove word from the end of a string that has been cut off to different lengths

Tags:

regex

perl

So let's say I have a list of strings which sometimes end with a phrase that has been cut off to different lengths. In this example the phrase is "hello".

my @strings =
(
    "Test 1 hello",
    "Something else",
    "Test 2 hell",
    "And also he",
    "Test 4 hel"
);

This is how I would remove the "hello" fragments right now:

foreach my $string (@strings)
{
    if ($string =~ m/(.*?)\s*(h(e(l(lo?)?)?)?)?$/)
    {
        print "'", $string, "' -> '", $1, "'\n";
    }
}

It does work:

'Test 1 hello' -> 'Test 1'
'Something else' -> 'Something else'
'Test 2 hell' -> 'Test 2'
'And also he' -> 'And also'
'Test 4 hel' -> 'Test 4'

However, I find the regular expression to match all the "hello" fragments long, confusing and hard to modify for future use cases. Is there a shorter way to write something equivalent to (h(e(l(lo?)?)?)?)?$?

like image 334
A Person Avatar asked Feb 01 '26 13:02

A Person


2 Answers

One way is to build the regex is an alternation of possible string versions. This I think should also extend well to more general uses

use warnings;
use strict;
use feature 'say';

my $target = shift || 'hello';

my @strings = (
    "Test 1 hello",
    "Something else",
    "Test 2 hell",
    "And also he",
    "Test 4 hel"
);

my $re_versions = build_regex($target);

foreach my $string (@strings)
{
    if ($string =~ /($re_versions)$/)
    {
        say "'$string' --> $1";
    }
};

sub build_regex {
    my ($s) = @_;
    my @versions;
    while ($s) {
        push @versions, quotemeta $s;
        chop $s;
    }
    return join '|', @versions;
}

This isn't shorter (while it certainly can be written in a shorter way) but it should be manageable for refinements in acceptable versions of the string, matching order, etc.

If there is a reason to want a compiled regex back change the function return to

my $re_str = join '|', @versions;
return qr/$re_str/;

where you can now also add flags that may be suitable.

like image 163
zdim Avatar answered Feb 03 '26 05:02

zdim


You are looking for a regexp to match following expressions at end of a string : hello, hell, hel, he, h. We can expect that the expression is preceeded by at least once space.

You could just write :

s/\s+(hello$)|(hell$)|(hel$)|(he$)|(h$)// for @strings;

This will modify in-place all elements in the array to what you expect.

I needed, you can generate the match string automatically for any given word :

my $word  = "hello";
my @parts = map { substr $word, 0, $_ } (1..(length $word));
my $match = join "|", map { "(" . $_ . "\$)" } @words;
s/\s+$match// for @strings;
like image 29
GMB Avatar answered Feb 03 '26 04:02

GMB



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!