I have a large dataset that I had to clean. Now, simplifying, I have this:
A B C D
1 1 5 2 2
4 2 5 3 1
5 3 3 2 1
8 4 1 4 4
So, the values for each column goes from 1 to 5. Now I want to transform this 4 columns in 5 dummy columns and count at the same time the amount of "values" for each row of each value, in order to have that:
S_1 S_2 S_3 S_4 S_5
1 1 2 0 0 1
4 1 1 1 0 1
5 1 1 2 0 0
8 1 0 0 3 0
So "S_1" represents the amount of "1" for each row, "S_2" the amount of "2" of each row, and so on.
I guess this is possible with a pivot table, but I can't do it. Can anybody help me, please?
One approach is to use collections.Counter:
import pandas as pd
from collections import Counter
data = [[1, 5, 2, 2],
[2, 5, 3, 1],
[3, 3, 2, 1],
[4, 1, 4, 4]]
df = pd.DataFrame(data=data, columns=['A', 'B', 'C', 'D'], index=[1, 4, 5, 8])
total = {k: 0 for k in range(1, 6)}
result = pd.DataFrame([{**total, **Counter(row)} for row in df.values], index=df.index)
result = result.rename(columns={k: f'S_{k}' for k in total}).fillna(0)
print(result)
Output
S_1 S_2 S_3 S_4 S_5
1 1 2 0 0 1
4 1 1 1 0 1
5 1 1 2 0 0
8 1 0 0 3 0
Use Counter to count the occurrences, the expression:
{**total, **Counter(row)}
creates a dictionary with 0
count for the missing values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With