Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split into N elements a string by spaces preserving quoted substrings

Tags:

python

I would like to split a string into 3 elements by spaces but I don't want the quoted substrings to be split (they can also contain backslash to escape the quotes).

For instance:

"command argument other arguments and options"
>> ['command', 'argument', 'other arguments and options']

'command "my argument" other arguments and options'
>> ['command', 'my argument', 'other arguments and options']

'command "my \"ugly\" argument" other "arguments" and options'
>> ['command', 'my "ugly" argument', 'other "arguments" and options']

I had a look at this similar question but shlex.split() will also split the end of the string (and it will remove the quotes and the spaces) whereas I want to keep the third element intact.

I tried to use shlex.split(mystring)[0:2] in order to get the first two elements but then I can't manage to find a good solution to extract the third element from the original string. Actually I wish I could use shlex.split() like the str.split() method with a maxsplit argument.

Is there a better way to do this than using shlex.split()? Perhaps regexes? Thanks!

like image 366
Nicolas Avatar asked Jan 20 '26 02:01

Nicolas


2 Answers

You should be able to hack a solution by accessing the parser state of a shlex object:

>>> import shlex
>>> s = shlex.shlex("command 'my \'ugly\' argument' other \"arguments\" and options", posix=True)
>>> s.whitespace_split = True
>>> s.commenters = ''
>>> next(s)
'command'
>>> next(s)
'my ugly argument'
>>> s.instream.read()
'other "arguments" and options'

See shlex.py module source.

like image 60
ecatmur Avatar answered Jan 21 '26 15:01

ecatmur


Why not re-join the remaining arguments, after splitting it with shlex?

command = command[:2] + [' '.join(command[2:])]

Alternatively, you'd have to drive the shlex.shlex() instance yourself:

>>> import shlex
>>> input = "command 'my \'ugly\' argument' other \"arguments\" and options"
>>> lex = shlex.shlex(input, posix=True)
>>> lex.whitespace_split=True
>>> lex.commenters = ''
>>> command = [lex.next(), lex.next(), lex.instream.read()]
>>> command
['command', 'my ugly argument', 'other "arguments" and options']

The .instream attribute is the file-like object holding the text being parsed, and will thus contain the remainder after parsing the first two arguments.

It is possible that you need to access the pushback state though, where the lexer stores tokens it took a look at but were not needed for the current token:

>>> command = [lex.next(), lex.next(), ''.join(list(lex.pushback)) + lex.instream.read()]
like image 28
Martijn Pieters Avatar answered Jan 21 '26 17:01

Martijn Pieters



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!