Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas remove characters between brackets [duplicate]

I would like to remove characters between [] and currently I am doing

df['Text'] = df['Text'].str.replace(r"\[.*\]","")

But the output isn't desirable. Before it is [image] This document and after it is ******* This document where * is whitespace.

How do I get rid of this white space.

Edit 1

The Text column of df looks like below:

ID    Text
0     REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5     Lease AureementMade and signed on the \ of Aug...
6     FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8     [image: image0.jpg] Jack[image: image1.jb2] ...
9     [image: image0.jpg] ABC SALES Meeting 97...
14    FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17    [image: image0.tif] Deep ML LEASE SERVI...
22    [image: image0.jpg] F 15 083 EX [image: image1...
26    LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28    [image: image0.jpg] 17. Medical VERIFICATION...
31    [image: image0.jpg]  [image: image1.jb2] PLL 3...
32    SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34    [image: image0.tif] Lease Agreement May 10, 20...
35    13057968.3  1 Initials:  _____  _____  SECOND ...
42    [image: image0.jpg] Jack Dowson Buy Real MI...
46     Deep – Machine Learning LEASE   B...

I would like to see

ID    Text
0     REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5     Lease AureementMade and signed on the \ of Aug...
6     FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8     Jack ...
9     ABC SALES Meeting 97...
14    FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17    Deep ML LEASE SERVI...
22    F 15 083 EX ...
26    LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28    17. Medical VERIFICATION...
31    PLL 3...
32    SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34    Lease Agreement May 10, 20...
35    13057968.3  1 Initials:  _____  _____  SECOND ...
42    Jack Dowson Buy Real MI...
46    Deep – Machine Learning LEASE   B...
like image 303
chintan s Avatar asked Oct 23 '25 15:10

chintan s


2 Answers

Looks like you need .str.strip()

Ex:

df = pd.DataFrame({"ID": [1,2,3], "Text": ["[image: 123.jpg] This document", "[image: image.jpg] Readers of the article", "The agreement between [image: image.jpg] two parties"]})
df["Text"] = df["Text"].str.replace(r"(\s*\[.*?\]\s*)", " ").str.strip()
print(df)

Output:

0                        This document
1               Readers of the article
2    The agreement between two parties
Name: Text, dtype: object
like image 71
Rakesh Avatar answered Oct 25 '25 04:10

Rakesh


Add optional space (?) to your regex, so the whole regex (match part) should be:

r'\[.*\] ?'

Another hint: Your regex is enclosed in parentheses (a capturing group). They are not needed. Remove them.

like image 39
Valdi_Bo Avatar answered Oct 25 '25 05:10

Valdi_Bo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!