Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandaic approach to iterating over a dataframe

I'm making a test-team report from excel input; using pandas to gather, filter, process data.

I made below code to make product-testcase cover table for later use/easy search. With 3rd column being type of test case. I have multiple testcases inside one excel so I need to go through all of cells and split tests to make pairs product - test case.

Because I'm not much familiar with pandas and I haven't found better way elsewhere I would like to ask if there is more pythonic way or easier in pandas way to do the same and more efficient.

code with example data ( \n is newline inside excel cell):

df = pd.DataFrame({"prod":["TS001","TS002"], 
                   "activate":["001_002\n001_004", "003_008\n024_080"],
                   "deactivate":["004_005\n006_008", "001_008"]})
df = df.set_index("prod")

list_of_tuples = []

for i, row in df.iterrows():
    for cell in row.iteritems():
        for test in cell[-1].splitlines():
            list_of_tuples.append((i, test, cell[0]))  # [(product, test, category)..]

return_df = pd.DataFrame(list_of_tuples, columns=('prod', 'testcase', 'category'))

producing:

    prod testcase    category
0  TS001  001_002    activate
1  TS001  001_004    activate
2  TS001  004_005  deactivate
3  TS001  006_008  deactivate
4  TS002  003_008    activate
5  TS002  024_080    activate
6  TS002  001_008  deactivate

Thank you for any suggestions.

like image 865
Petr Szturc Avatar asked Dec 19 '25 20:12

Petr Szturc


1 Answers

With a comprehension

pd.DataFrame(
    [(p, t, c) for (p, c), r in df.stack().items() for t in r.split()],
    columns=['prod', 'testcase', 'category']
)

    prod testcase    category
0  TS001  001_002    activate
1  TS001  001_004    activate
2  TS001  004_005  deactivate
3  TS001  006_008  deactivate
4  TS002  003_008    activate
5  TS002  024_080    activate
6  TS002  001_008  deactivate

Explanation

df.stack()

prod             
TS001  activate      001_002\n001_004
       deactivate    004_005\n006_008
TS002  activate      003_008\n024_080
       deactivate             001_008
dtype: object

When iterating over df.stack().items(), we get tuples with index value as the first element and value as the second element. Because we stacked, the index value itself is a tuple. So the first pair looks like:

(('TS001', 'activate'), '001_002\n001_004')

Through subsequent iteration over '001_002\n001_004'.split() and rearranging the unpacked elements, we get

[(p, t, c) for (p, c), r in df.stack().items() for t in r.split()]

[('TS001', '001_002', 'activate'),
 ('TS001', '001_004', 'activate'),
 ('TS001', '004_005', 'deactivate'),
 ('TS001', '006_008', 'deactivate'),
 ('TS002', '003_008', 'activate'),
 ('TS002', '024_080', 'activate'),
 ('TS002', '001_008', 'deactivate')]

I then wrap this in a pd.DataFrame constructor where I name the columns.

like image 121
piRSquared Avatar answered Dec 21 '25 09:12

piRSquared