I would like to remove characters between [] and currently I am doing
df['Text'] = df['Text'].str.replace(r"\[.*\]","")
But the output isn't desirable. Before it is [image] This document and after it is ******* This document where * is whitespace.
How do I get rid of this white space.
Edit 1
The Text column of df looks like below:
ID Text
0 REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5 Lease AureementMade and signed on the \ of Aug...
6 FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8 [image: image0.jpg] Jack[image: image1.jb2] ...
9 [image: image0.jpg] ABC SALES Meeting 97...
14 FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17 [image: image0.tif] Deep ML LEASE SERVI...
22 [image: image0.jpg] F 15 083 EX [image: image1...
26 LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28 [image: image0.jpg] 17. Medical VERIFICATION...
31 [image: image0.jpg] [image: image1.jb2] PLL 3...
32 SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34 [image: image0.tif] Lease Agreement May 10, 20...
35 13057968.3 1 Initials: _____ _____ SECOND ...
42 [image: image0.jpg] Jack Dowson Buy Real MI...
46 Deep – Machine Learning LEASE B...
I would like to see
ID Text
0 REAL ESTATE LEASE THIS INDUSTRIAL REAL ESTAT...
5 Lease AureementMade and signed on the \ of Aug...
6 FIRST AMENDMENT OF LEASEDATE: August 31, 2001L...
8 Jack ...
9 ABC SALES Meeting 97...
14 FIRST AMENDMENT OF LEASETHIS FIRST AMENDMENT O...
17 Deep ML LEASE SERVI...
22 F 15 083 EX ...
26 LEASE AGREEMENT—GROSS LEASEBASIC LEASE PROVISI...
28 17. Medical VERIFICATION...
31 PLL 3...
32 SUBLEASETHIS SUBLEASE this “Sublease” made as ...
34 Lease Agreement May 10, 20...
35 13057968.3 1 Initials: _____ _____ SECOND ...
42 Jack Dowson Buy Real MI...
46 Deep – Machine Learning LEASE B...
Looks like you need .str.strip()
Ex:
df = pd.DataFrame({"ID": [1,2,3], "Text": ["[image: 123.jpg] This document", "[image: image.jpg] Readers of the article", "The agreement between [image: image.jpg] two parties"]})
df["Text"] = df["Text"].str.replace(r"(\s*\[.*?\]\s*)", " ").str.strip()
print(df)
Output:
0 This document
1 Readers of the article
2 The agreement between two parties
Name: Text, dtype: object
Add optional space (?) to your regex, so the whole regex (match part) should be:
r'\[.*\] ?'
Another hint: Your regex is enclosed in parentheses (a capturing group). They are not needed. Remove them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With