I have a this data frame:
and I would like to calculate a new columns as de the mean of salary_1, salary_2and salary_3.
df = pd.DataFrame({'salary_1':[230,345,222],'salary_2':[235,375,292],'salary_3':[210,385,260]})        salary_1     salary_2    salary_3 0        230           235        210 1        345           375        385 2        222           292        260 How can I do it in pandas in the most efficient way? Actually I have many more columns and I don't want to write this one by one.
Something like this:
      salary_1     salary_2    salary_3     salary_mean 0        230           235        210     (230+235+210)/3 1        345           375        385       ... 2        222           292        260       ... Thank you!
To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.
Combine Two Columns Using + OperatorBy use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
Use .mean. By specifying the axis you can take the average across the row or the column.
df['average'] = df.mean(axis=1) df returns
       salary_1  salary_2  salary_3     average 0       230       235       210  225.000000 1       345       375       385  368.333333 2       222       292       260  258.000000 If you only want the mean of a few you can select only those columns. E.g.
df['average_1_3'] = df[['salary_1', 'salary_3']].mean(axis=1) df returns
   salary_1  salary_2  salary_3  average_1_3 0       230       235       210        220.0 1       345       375       385        365.0 2       222       292       260        241.0 an easy way to solve this problem is shown below :
col = df.loc[: , "salary_1":"salary_3"] where "salary_1" is the start column name and "salary_3" is the end column name
df['salary_mean'] = col.mean(axis=1) df This will give you a new dataframe with a new column that shows the mean of all the other columns This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With