Extract Letters and the first Digit only

Question

I am working with a data frame that contain letters, special characters and digits. My goal is to extract all letters and the first digit. All digits always occur at the end after letters and special characters; however, some letters may appear after special characters. See the example below:

d = {'col1': ['A./B. 1234', 'CDEF/G5.','AB./C23']}
df = pd.DataFrame(data=d)
print(df)
#    col1
# 0  A./B. 1234
# 1  CDEF/G5.
# 2  AB./C23

I looked up many variants but I do not know how handle special characters ./ and the likes.

df.col1.str.extract('([A-Za-z\d]+)')
#    0
# 0  A
# 1  CDEF
# 2  AB

This gives me all the letters and digits until it reaches a special character. Eventually I would like to get the following output:

AB1
CDEFG5
ABC2

I am new to regex.

Nick · Accepted Answer

You need to extract all the characters up to and including the first digit, and then replace any non-letter/digit characters with an empty string:

d = {'col1': ['A./B. 1234', 'CDEF/G5.','AB./C23']}
df = pd.DataFrame(data=d)
df.col1.str.extract(r'^([^\d]+\d)').replace('[^A-Za-z0-9]', '', regex=True)

Output:

        0
0     AB1
1  CDEFG5
2    ABC2

BENY · Answer

Another method

s=df['col1'].str.extractall("([a-zA-Z0-9])")[0]
s[s.str.isalpha()|s.shift().str.isalpha()].sum(level=0)
0       AB1
1    CDEFG5
2      ABC2
Name: 0, dtype: object

Extract Letters and the first Digit only

Tags:

python

regex

pandas

Rob

2 Answers

Nick

BENY

Recent Activity

Donate For Us

Extract Letters and the first Digit only

Tags:

python

regex

pandas

Rob

2 Answers

Nick

BENY

Related questions

Recent Activity

Donate For Us