Pandas extractall merge

Question

Not sure if I should fix my regex pattern, or process more with pandas.

Here's a mock setup:

import re
import pandas as pd

regex = r"(?P<adv>This)|(?P<noun>test)"
texts = ["This is a test", "Random stuff with no match"]
series = pd.Series(texts)

I want to find all matches for groups (<adv>, <noun> -- there are typically more than two). These groups are designed to be exclusive hence I would want to have only one row result with the captured string / NaN.

Current output: multi-index rows, only for texts that have a match

>>> print(series.str.extractall(regex))
          adv  noun
  match            
0 0      This   NaN
  1       NaN  test

Expected output: one row per input text, and aggregated matchs per group

          adv  noun
0        This  test
1         NaN   NaN

Any chance for a hand on this? Either fix the regex, or post-process with pandas. Thanks!

anky · Accepted Answer

You can try;

series.str.extractall(regex).groupby(level=0).first()

    adv  noun
0  This  test

Pandas extractall merge

Tags:

python

regex

pandas

arnaud

1 Answers

anky

Recent Activity

Donate For Us

Pandas extractall merge

Tags:

python

regex

pandas

arnaud

1 Answers

anky

Related questions

Recent Activity

Donate For Us