Accept everything until "more than 1 white space", using pyparsing

Question

I have a file and parts of it looks like this:

string                     0            1           10
string with white space    0            10          30
string9 with number 9      10           20          50
string_ with underline     10           50          1
(string with parentese)    50           20          100

I need to parse each line, into something like:

[[string, 0 ,1 ,10], ....]

As you can see above, the first part can be pretty much anything, and the only way I can think of parsing this is to accept anything until I have 2 white space characters, then it is just numbers.

But I can not find this "UNTIL"-functionality in pyparsing doc.

Pedro Romano · Accepted Answer

The following code sample achieves what you want (with improvements over the previous version suggested by @PaulMcGuire):

from __future__ import print_function

from pyparsing import CharsNotIn, Group, LineEnd, OneOrMore, Word, ZeroOrMore
from pyparsing import delimitedList, nums 

SPACE_CHARS = ' 	'
word = CharsNotIn(SPACE_CHARS)
space = Word(SPACE_CHARS, exact=1)
label = delimitedList(word, delim=space, combine=True)
# an alternative contruction for 'label' could be:
# label = Combine(word + ZeroOrMore(space + word))
value = Word(nums)
line = label('label') + Group(OneOrMore(value))('values') + LineEnd().suppress()

text = """
string                     0            1           10
string with white space    0            10          30
string9 with number 9      10           20          50
string_ with underline     10           50          1
(string with parentese)    50           20          100
""".strip()

print('input text:
', text, '
parsed text:
', sep='
')
for line_tokens, start_location, end_location in line.scanString(text):
    print(line_tokens.dump())

giving the following output:

input text:

string                     0            1           10
string with white space    0            10          30
string9 with number 9      10           20          50
string_ with underline     10           50          1
(string with parentese)    50           20          100

parsed text:

['string', ['0', '1', '10']]
- label: string
- values: ['0', '1', '10']
['string with white space', ['0', '10', '30']]
- label: string with white space
- values: ['0', '10', '30']
['string9 with number 9', ['10', '20', '50']]
- label: string9 with number 9
- values: ['10', '20', '50']
['string_ with underline', ['10', '50', '1']]
- label: string_ with underline
- values: ['10', '50', '1']
['(string with parentese)', ['50', '20', '100']]
- label: (string with parentese)
- values: ['50', '20', '100']

The parsed values can be obtained as a dictionary with the first column (which was named label in the example above) as the key and the list of the remaining columns (named values above) as the values with the following dict comprehension:

{label: values.asList() for label, values in line.searchString(text)}

where line and text are the variables from the example above, generating the following result:

{'(string with parentese)': ['50', '20', '100'],
 'string': ['0', '1', '10'],
 'string with white space': ['0', '10', '30'],
 'string9 with number 9': ['10', '20', '50'],
 'string_ with underline': ['10', '50', '1']}

Alexander · Answer

For the sake of completeness, this one doesn't use pyparsing.

import re
lines   = re.compile("
?
").split(text)
pattern = re.compile("\s\s+")
for line in lines:
  print pattern.split(line)
#['string', '0', '1', '10']
#['string with white space', '0', '10', '30']
#['string9 with number 9', '10', '20', '50']
#['string_ with underline', '10', '50', '1']
#['(string with parentese)', '50', '20', '100']

Accept everything until "more than 1 white space", using pyparsing

Tags:

python

python-2.7

pyparsing

theAlse

2 Answers

Pedro Romano

Alexander

Recent Activity

Donate For Us

Accept everything until "more than 1 white space", using pyparsing

Tags:

python

python-2.7

pyparsing

theAlse

2 Answers

Pedro Romano

Alexander

Related questions

Recent Activity

Donate For Us