It is pandas/Dataframe, it contains all scores for everyone everyday, I want to add one extra column to collect how many time it has the highest score (could be more than one people and some data are nan)
import pandas as pd
import numpy as np
data = np.array([['','day1','day2','day3','day4','day5'],
['larry',1,4,7,3,5],
['niko',2,-1,3,6,4],
['tin',np.nan,5,5, 6,7]])
df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
print(df)
output
day1 day2 day3 day4 day5
larry 1 4 7 3 5
niko 2 -1 3 6 4
tin nan 5 5 6 7
expected result is (larry: 1 time, niko: 2 times, tin: 3 times)
times_of_top day1 day2 day3 day4 day5
larry 1 1 4 7 3 5
niko 2 2 -1 3 6 4
tin 3 nan 5 5 6 7
niko has the highest score on day1 and day4 so his times_of_top is 2.tin has the highest score on day2, day4 and day5 so his times_of_top is 3.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
To sum the number of times an element or number appears, Python's value_counts() function is used. The mode() method can then be used to get the most often occurring element.
In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.
One way using pandas.DataFrame.stack and count:
# df = df.astype(float)
# Since the sample data are in object type
df["times_of_top"] = df[df == df.max()].stack().count(0)
print(df)
Output:
day1 day2 day3 day4 day5 times_of_top
larry 1.0 4.0 7.0 3.0 5.0 1
niko 2.0 -1.0 3.0 6.0 4.0 2
tin NaN 5.0 5.0 6.0 7.0 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With