I have the following dataframe:
import pandas as pd
data = {'Test_Step_ID': ['9.1.1', '9.1.2', '9.1.3', '9.1.4'],
'Protocol_Name': ['A', 'B', 'C', 'D'],
'Req_ID': ['SRS_0081d', 'SRS_0079', 'SRS_0082SRS_0082a', 'SRS_0015SRS_0015cSRS_0015d']
}
df = pd.DataFrame(data)
I want to duplicate the rows based on the column "Req_ID" based on the "SRS" value keeping all other columns values same; hence I want 2 rows for the SRS_0082, SRS_0082a and then three rows for SRS_0015, SRS_0015c, SRS_0015d
Can someone help me here? appreciate the help. Thanks in advance. [EDITED]:
I want the result to look like this:
split
on the zero width location between SRS
and a preceding character using the '(?<=.)(?=SRS)
regex, and explode
:
out = (df
.assign(Req_ID=df['Req_ID'].str.split(r'(?<=.)(?=SRS)'))
.explode('Req_ID')
)
Output:
Test_Step_ID Protocol_Name Req_ID
0 9.1.1 A SRS_0081d
1 9.1.2 B SRS_0079
2 9.1.3 C SRS_0082
2 9.1.3 C SRS_0082a
3 9.1.4 D SRS_0015
3 9.1.4 D SRS_0015c
3 9.1.4 D SRS_0015d
Regex:
(?<=.) # match any character before the split
(?=SRS) # match "SRS" after the split
regex demo
I have modified your code, you can try -
df['Req_ID'] = df['Req_ID'].str.split('SRS_')
df = df.explode('Req_ID')
df['Req_ID'] = df['Req_ID'].str.strip()
df = df[df['Req_ID'].ne('')]
df['Req_ID'] = 'SRS_' + df['Req_ID']
print(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With