Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find strings with UPPER case letters and ends with a certain word in regex

I have a dataframe where one column consists of strings that have three patterns:

1) Upper case letters only: APPLE COMPANY

2) Upper case letters and ends with the letters AS: CAR COMPANY AS

3) Upper and lower case letters: John Smith

df = pd.DataFrame({'NAME': ['APPLE COMPANY', 'CAR COMPANY AS', 'John Smith']})

             NAME ...
0   APPLE COMPANY ...
1  CAR COMPANY AS ...
2      John Smith ...
3             ... ...

How can I take out those rows that do not meet the conditions of 2) and 3), i.e. 1)? In other words, how can I take out rows that only have UPPER case letters, does not end with AS or have both UPPER and LOWER letters in the string?

I came up with this:

df['NAME'].str.findall(r"(^[A-Z ':]+$)")
df['NAME'].str.findall('AS')

The first one extract strings with only upper letters, but second one only finds AS. If there are other methods than regex than I happy to try that as well.

Expected outcome is:

             NAME ...
1  CAR COMPANY AS ...
2      John Smith ...
3             ... ...
like image 753
Mataunited18 Avatar asked Nov 22 '25 14:11

Mataunited18


2 Answers

This regex should work:

^(?:[A-Z ':]+ AS|.*[a-z].*)$

It matches either one of these:

  • [A-Z ':]+ AS - The case of all uppercase letters followed by AS
  • .*[a-z].* - The case of lowercase letters

Demo

like image 168
Sweeper Avatar answered Nov 24 '25 06:11

Sweeper


one way would be,

df['temp']=df['NAME'].str.extract("(^[A-Z ':]+$)")
s1=df['temp']==df["NAME"]
s2=~df['NAME'].str.endswith('AS')

print(df.loc[~(s1&s2), 'NAME'])

O/P:

1    CAR COMPANY AS
2        John Smith
Name: NAME, dtype: object
like image 29
Mohamed Thasin ah Avatar answered Nov 24 '25 04:11

Mohamed Thasin ah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!