I'm making a test-team report from excel input; using pandas to gather, filter, process data.
I made below code to make product-testcase cover table for later use/easy search. With 3rd column being type of test case. I have multiple testcases inside one excel so I need to go through all of cells and split tests to make pairs product - test case.
Because I'm not much familiar with pandas and I haven't found better way elsewhere I would like to ask if there is more pythonic way or easier in pandas way to do the same and more efficient.
code with example data ( \n is newline inside excel cell):
df = pd.DataFrame({"prod":["TS001","TS002"],
"activate":["001_002\n001_004", "003_008\n024_080"],
"deactivate":["004_005\n006_008", "001_008"]})
df = df.set_index("prod")
list_of_tuples = []
for i, row in df.iterrows():
for cell in row.iteritems():
for test in cell[-1].splitlines():
list_of_tuples.append((i, test, cell[0])) # [(product, test, category)..]
return_df = pd.DataFrame(list_of_tuples, columns=('prod', 'testcase', 'category'))
producing:
prod testcase category
0 TS001 001_002 activate
1 TS001 001_004 activate
2 TS001 004_005 deactivate
3 TS001 006_008 deactivate
4 TS002 003_008 activate
5 TS002 024_080 activate
6 TS002 001_008 deactivate
Thank you for any suggestions.
With a comprehension
pd.DataFrame(
[(p, t, c) for (p, c), r in df.stack().items() for t in r.split()],
columns=['prod', 'testcase', 'category']
)
prod testcase category
0 TS001 001_002 activate
1 TS001 001_004 activate
2 TS001 004_005 deactivate
3 TS001 006_008 deactivate
4 TS002 003_008 activate
5 TS002 024_080 activate
6 TS002 001_008 deactivate
Explanation
df.stack()
prod
TS001 activate 001_002\n001_004
deactivate 004_005\n006_008
TS002 activate 003_008\n024_080
deactivate 001_008
dtype: object
When iterating over df.stack().items(), we get tuples with index value as the first element and value as the second element. Because we stacked, the index value itself is a tuple. So the first pair looks like:
(('TS001', 'activate'), '001_002\n001_004')
Through subsequent iteration over '001_002\n001_004'.split() and rearranging the unpacked elements, we get
[(p, t, c) for (p, c), r in df.stack().items() for t in r.split()]
[('TS001', '001_002', 'activate'),
('TS001', '001_004', 'activate'),
('TS001', '004_005', 'deactivate'),
('TS001', '006_008', 'deactivate'),
('TS002', '003_008', 'activate'),
('TS002', '024_080', 'activate'),
('TS002', '001_008', 'deactivate')]
I then wrap this in a pd.DataFrame constructor where I name the columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With