create filtered value using pandas

Question

I have a csv file in which the first line reads something like the following:

Pyscip_V1.11 Ref: #001=XYZ_0[1234] #50=M3_0[112] #51=M3_1[154] #52=M3_2[254]...

and so on. What I'd like to do is create filtered value such that the first column is Ref and it takes all the values after the # sign like 001,50,51,52... The second column name is ID and it takes the value after = like XYZ_0,M3_0,M3_1,M3_2,M3_3... And finally make a third column which takes all the values present in the square brackets like 1234,112,154,254,...

header_pattern = r'Pyscip_V(\d+\.\d+) Ref:'

version_match = re.search(header_pattern, first_line.iloc[0, 0])
version_number = version_match.group(1) if version_match else ''

matches = re.findall(r'#(\d+)=(\w+_\d)$$([\d]+)$$', first_line.iloc[0, 0])

parsed_df = []
for match in matches:
    row_dict = {
        'Ref': match[0] if match[0] else '',
        'ID': match[1] if match[1] else '',
        'Ser_No': match[2] if match[2] else ''
    }
    parsed_df.append(row_dict)

new_df = pd.DataFrame(parsed_df)

However, I only get enpty dataframe. What seems to be the problem here?

Edit: the data from 3rd row looks like the following:

ID  Date    XYZ_0  M3_0   M3_1  M3_2    
1   22.12.2023  12.6  0.5 1.2   2.3

The expected outcome is

Ref ID Num
001 XYZ_0 1234
50 M3_0 112
51 M3_1 154

mozway · Accepted Answer

I would open the csv file, extract the first line and process it, only after read the rest of the CSV with pandas.

For that, your original approach and regex are fine.

import re
import pandas as pd

with open('my_csv.csv') as f:
    first_line = next(f)
    header_df = pd.DataFrame(re.findall(r'#(\d+)=(\w+_\d)$$([\d]+)$$',
                                        first_line),
                             columns=['Ref', 'ID', 'Num'])
    
    data_df = pd.read_csv(f, sep=r'\s+')

print(header_df)
print(data_df)

Output:

# header_df
   Ref     ID   Num
0  001  XYZ_0  1234
1   50   M3_0   112
2   51   M3_1   154
3   52   M3_2   254

# data_df
   ID        Date  XYZ_0  M3_0  M3_1  M3_2
0   1  22.12.2023   12.6   0.5   1.2   2.3

Andrej Kesely · Answer

You can try (regex101 link):

import re
import pandas as pd

text = """\
Pyscip_V1.11 Ref: #001=XYZ_0[1234] #50=M3_0[112] #51=M3_1[154] #52=M3_2[254]
"""

pat = r"#(\d+)=([^[]+)$$([^$$]+)\]"

data = []
for line in text.splitlines():
    for a, b, c in re.findall(pat, line):
        data.append((a, b, c))

df = pd.DataFrame(data, columns=["Ref", "ID", "Num"])
print(df)

Prints:

   Ref     ID   Num
0  001  XYZ_0  1234
1   50   M3_0   112
2   51   M3_1   154
3   52   M3_2   254

create filtered value using pandas

Tags:

python

pandas

dnem

2 Answers

mozway

Andrej Kesely

Recent Activity

Donate For Us

create filtered value using pandas

Tags:

python

pandas

dnem

2 Answers

mozway

Andrej Kesely

Related questions

Recent Activity

Donate For Us