I have a script that is trying to get blocks of information from gparted. My Data looks like: <pre class="prettyprint"><code>Disk /dev/sda: 42.9GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 1049kB 316MB 315MB primary ext4 boot 2 316MB 38.7GB 38.4GB primary ext4 3 38.7GB 42.9GB 4228MB primary linux-swap(v1) log4net.xml Model: VMware Virtual disk (scsi) Disk /dev/sdb: 42.9GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 1049kB 316MB 315MB primary ext4 boot 5 316MB 38.7GB 38.4GB primary ext4 6 38.7GB 42.9GB 4228MB primary linux-swap(v1) </code></pre> I use a regex to break this into two Disk blocks <pre class="prettyprint"><code>^Disk (/dev[\S]+):((?!Disk)[\s\S])* </code></pre> This works with multiline on. When I test this in a bash script, I can't seem to match <code>\s</code>, or <code>\S</code> -- What am I doing wrong? I am testing this through a script like: <pre class="prettyprint"><code>data=`cat disks.txt` morematches=1 x=0 regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*" if [[ $data =~ $regex ]]; then echo "Matched" while [ $morematches == 1 ] do x=$[x+1] if [[ ${BASH_REMATCH[x]} != "" ]]; then echo $x "matched" ${BASH_REMATCH[x]} else echo $x "Did not match" morematches=0; fi done fi </code></pre> However, when I walk through testing parts of the regex, Whenever I match a <code>\s</code> or <code>\S</code>, it doesn't work -- what am I doing wrong?

Perhaps \S and \s are not supported, or that you cannot place them around <code>[ ]</code>. Try to use the following regex instead: <pre class="prettyprint"><code>^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+ </code></pre> EDIT It seems like you actually want to get the matching fields. I simplified the script to this for that. <pre class="prettyprint"><code>#!/bin/bash regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)' while read line; do [[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}." done < disks.txt </code></pre> Produces: <pre class="prettyprint"><code>/dev/sda matches 42.9GB. /dev/sdb matches 42.9GB. </code></pre>

Because this is a common FAQ, let me list a few constructs which are not supported in Bash, and how to work around them, where there is a simple workaround. There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant. <ul> <li>Bash doesn't support <code>\d</code> <code>\D</code> <code>\s</code> <code>\S</code> <code>\w</code> <code>\W</code> -- these can be replaced with POSIX character class equivalents <code>[[:digit:]]</code>, <code>[^[:digit:]]</code>, <code>[[:space:]]</code>, <code>[^[:space:]]</code>, <code>[_[:alnum:]]</code>, and <code>[^_[:alnum:]]</code>, respectively. (Notice the last case, where the <code>[:alnum:]</code> POSIX character class is augmented with underscore to be exactly equivalent to the Perl <code>\w</code> shorthand.)</li> <li>Bash doesn't support non-greedy matching. You can sometimes replace <code>a.*?b</code> with something like <code>a[^ab]*b</code> to get a similar effect in practice, though the two are not exactly equivalent.</li> <li>Bash doesn't support non-capturing parentheses <code>(?:...)</code>. In the trivial case, just use capturing parentheses <code>(...)</code> instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.</li> <li>Bash doesn't support lookarounds like <code>(?<=before)</code> or <code>(?!after)</code> and in fact anything with <code>(?</code> is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.</li> </ul>

Bash Regular Expression -- Can't seem to match any of \s \S \d \D \w \W etc

Tags:

regex

bash

I have a script that is trying to get blocks of information from gparted.

My Data looks like:

Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system     Flags
 1      1049kB  316MB   315MB   primary  ext4            boot
 2      316MB   38.7GB  38.4GB  primary  ext4
 3      38.7GB  42.9GB  4228MB  primary  linux-swap(v1)

log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system     Flags
 1      1049kB  316MB   315MB   primary  ext4            boot
 5      316MB   38.7GB  38.4GB  primary  ext4
 6      38.7GB  42.9GB  4228MB  primary  linux-swap(v1)

I use a regex to break this into two Disk blocks

^Disk (/dev[\S]+):((?!Disk)[\s\S])*

This works with multiline on.

When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?

I am testing this through a script like:

data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"

if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
        x=$[x+1]
        if [[ ${BASH_REMATCH[x]} != "" ]]; then
                echo $x "matched" ${BASH_REMATCH[x]}
        else
                echo $x "Did not match"
                morematches=0;
        fi

done

fi

However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?

837

asked Aug 29 '13 14:08

Yablargo

2 Answers

Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:

^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+

EDIT

It seems like you actually want to get the matching fields. I simplified the script to this for that.

#!/bin/bash 

regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'

while read line; do
    [[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt

Produces:

/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.

answered Sep 27 '22 18:09

konsolebox

Because this is a common FAQ, let me list a few constructs which are not supported in Bash, and how to work around them, where there is a simple workaround.

There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.

Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.

answered Sep 27 '22 18:09

tripleee

Related questions
                            
                                Extracting top-level and second-level domain from a URL using regex
                            
                                Extract text with multiple separators
                            
                                Perform regex (replace) in an SQL query
                            
                                C# How to delete XML/HTML comments with regular expression
                            
                                Replace all characters in a regex match with the same character in Vim
                            
                                Java regex error - Look-behind group does not have an obvious maximum length
                            
                                Regular Expression almost perfect for a Numeric Value
                            
                                Find numbers in string using Golang regexp
                            
                                Is There a Way to Match Any Unicode Alphabetic Character?
                            
                                Negative look ahead python regex
                            
                                Which is more efficient, PHP string functions or regex in PHP?
                            
                                Regex to Match only language chars (all language)?
                            
                                How to test a regex password in Python?
                            
                                Is there a not (!) operator in regexp?
                            
                                Regular expression help - comma delimited string
                            
                                SED: multiple patterns on the same line, how to match/parse first one
                            
                                Regular expressions for a range of unicode points PHP
                            
                                regex check for white space in middle of string
                            
                                PHP regex groups captures
                            
                                Regex matching between two strings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With