Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate multiple columns from multiple columns in pandas

I am trying to calculate multiple colums from multiple columns in a pandas dataframe using a function. The function takes three arguments -a-, -b-, and -c- and and returns three calculated values -sum-, -prod- and -quot-. In my pandas data frame I have three coumns -a-, -b- and and -c- from which I want to calculate the columns -sum-, -prod- and -quot-.

The mapping that I do works only when I have exactly three rows. I do not know what is going wrong, although I expect that it has to do something with selecting the correct axis. Could someone explain what is happening and how I can calculate the values that I would like to have. Below are the situations that I have tested.

INITIAL VALUES

def sum_prod_quot(a,b,c):
    sum  = a + b + c
    prod = a * b * c
    quot = a / b / c
    return (sum, prod, quot)

df = pd.DataFrame({ 'a': [20, 100, 18],
                    'b': [ 5,  10,  3],
                    'c': [ 2,  10,  6],
                    'd': [ 1,   2,  3]
                 })

df
    a   b   c  d
0   20   5   2  1
1  100  10  10  2
2   18   3   6  3

CALCULATION STEPS

Using exactly three rows

When I calculate three columns from this dataframe and using the function function I get:

df['sum'], df['prod'], df['quot'] = \
        list( map(sum_prod_quot, df['a'], df['b'], df['c']))

df
     a   b   c  d    sum     prod   quot
0   20   5   2  1   27.0    120.0   27.0
1  100  10  10  2  200.0  10000.0  324.0
2   18   3   6  3    2.0      1.0    1.0

This is exactly the result that I want to have: The sum-column has the sum of the elements in the columns a,b,c; the prod-column has the product of the elements in the columns a,b,c and the quot-column has the quotients of the elements in the columns a,b,c.

Using more than three rows

When I expand the dataframe with one row, I get an error!

The data frame is defined as:

df = pd.DataFrame({ 'a': [20, 100, 18, 40],
                    'b': [ 5,  10,  3, 10],
                    'c': [ 2,  10,  6,  4],
                    'd': [ 1,   2,  3,  4]
                 })
df
     a   b   c  d
0   20   5   2  1
1  100  10  10  2
2   18   3   6  3
3   40  10   4  4

The call is

df['sum'], df['prod'], df['quot'] = \
        list( map(sum_prod_quot, df['a'], df['b'], df['c']))

The result is

...
    list( map(sum_prod_quot, df['a'], df['b'], df['c']))
ValueError: too many values to unpack (expected 3) 

while I would expect an extra row:

df
     a   b   c  d    sum     prod   quot
0   20   5   2  1   27.0    120.0   27.0
1  100  10  10  2  200.0  10000.0  324.0
2   18   3   6  3    2.0      1.0    1.0
3   40  10   4  4   54.0   1600.0    1.0

Using less than three rows

When I reduce tthe dataframe with one row I get also an error. The dataframe is defined as:

df = pd.DataFrame({ 'a': [20, 100],
                    'b': [ 5,  10],
                    'c': [ 2,  10],
                    'd': [ 1,   2]
                 })
df
     a   b   c  d
0   20   5   2  1
1  100  10  10  2

The call is

df['sum'], df['prod'], df['quot'] = \
        list( map(sum_prod_quot, df['a'], df['b'], df['c']))

The result is

...
    list( map(sum_prod_quot, df['a'], df['b'], df['c']))
ValueError: need more than 2 values to unpack

while I would expect a row less:

df
     a   b   c  d    sum     prod   quot
0   20   5   2  1   27.0    120.0   27.0
1  100  10  10  2  200.0  10000.0  324.0

QUESTIONS

The questions I have:

1) Why do I get these errors?

2) How do I have to modify the call such that I get the desired data frame?

NOTE

In this link a similar question is asked, but the given answer did not work for me.

like image 895
PeterDev Avatar asked Nov 08 '25 21:11

PeterDev


2 Answers

The answer doesn't seem correct for 3 rows as well. Can you check other values except first row and first column. Looking at the results, product of 20*5*2 is NOT 120, it's 200 and is placed below in sum column. You need to form list in correct way before assigning to new columns. You can try use following to set the new columns:

df['sum'], df['prod'], df['quot'] = zip(*map(sum_prod_quot, df['a'], df['b'], df['c']))

For details follow the link

like image 139
student Avatar answered Nov 10 '25 10:11

student


The apply() method can be used for this problem, when providing the result_type argument:

df[["sum", "prod", "quot"]] = df.apply(
    lambda row: sum_prod_quot(row["a"], row["b"], row["c"]),
    axis=1,
    result_type="expand",
)

alternatively (faster for larger dataframes) use apply() only on a selection of the dataframe and unpacking the values of row

df[["sum", "prod", "quot"]] = df[["a", "b", "c"]].apply(
    lambda row: sum_prod_quot(*row),
    axis=1,
    result_type="expand",
)

In both cases, the return value of sum_prod_quot() is added as new columns to df.

Explanation:

  • apply() with axis=1 applies the function (first argument) to each row separately.
  • Use of lambda takes care that passing of row matches the function's signature (sum_prod_quot() requires 3 arguments).
  • result_type="expand" turns the list-like return value of sum_prod_quot() into columns.
  • We can then directly assign the data of these now columns to the (new) "sum", "prod", and "quot" columns of df (df[["sum", "prod", "quot"]]).

For further info see pandas docs.

like image 27
lcnittl Avatar answered Nov 10 '25 10:11

lcnittl



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!