Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframes multiplication with or without broadcasting

I have 2 dataframes:

>>> type(c)
Out[118]: pandas.core.frame.DataFrame
>>> type(N)
Out[119]: pandas.core.frame.DataFrame

>>> c
Out[114]: 
                       t
2017-06-01 01:06:00 1.00
2017-06-01 01:13:00 1.00
2017-06-01 02:09:00 1.00
2017-06-26 22:47:00 1.00

>>> N
Out[115]: 
                       0    1
2017-06-01 01:06:00 1.00 1.00
2017-06-01 01:13:00 1.00 1.00
2017-06-01 02:09:00 1.00 1.00
2017-06-26 22:47:00 1.00 1.00

I need to multiply these together to get a 4,2 dataframe that is multiplication of each column of N elementwise with the C. I tried the following 4 approaches with no luck:

>>> N.multiply(c, axis='index')
Out[116]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:]*N
Out[98]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c*N
Out[99]: 
                      0   1   t
2017-06-01 01:06:00 nan nan nan
2017-06-01 01:13:00 nan nan nan
2017-06-01 02:09:00 nan nan nan
2017-06-26 22:47:00 nan nan nan

>>> c[:, None]*N
Traceback (most recent call last):

  File "C:\...pandas\core\frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "C:\...core\frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "C:\...core\generic.py", line 1082, in _get_item_cache
    res = cache.get(item)
TypeError: unhashable type

Is there a way, with or without broadcasting to do this easily?

like image 532
dayum Avatar asked Oct 20 '25 14:10

dayum


1 Answers

The problem is that you pass a DataFrame so it tries to match the column names too. If you slice the column t, it will become a Series and it will broadcast appropriately:

N.mul(c['t'], axis=0)
Out: 
                       0    1
2017-06-01 01:06:00  1.0  1.0
2017-06-01 01:13:00  1.0  1.0
2017-06-01 02:09:00  1.0  1.0
2017-06-26 22:47:00  1.0  1.0

In the case of numpy arrays, you don't need to specify anything. With shapes of (4, 2) and (4, 1) numpy will see the axis with the same length and broadcast accordingly.

Consider the following DataFrames:

N
Out: 
                       0    1
2017-06-01 01:06:00  1.0  2.0
2017-06-01 01:13:00  6.0  5.0
2017-06-01 02:09:00  4.0  3.0
2017-06-26 22:47:00  4.0  7.0


c
Out: 
                       t
2017-06-01 01:06:00  6.0
2017-06-01 01:13:00  2.0
2017-06-01 02:09:00  8.0
2017-06-26 22:47:00  2.0

You can access the underlying array with the .values attribute so

N.values * c.values
Out: 
array([[  6.,  12.],
       [ 12.,  10.],
       [ 32.,  24.],
       [  8.,  14.]])

will give you the same result as

N.mul(c['t'], axis=0)
Out: 
                        0     1
2017-06-01 01:06:00   6.0  12.0
2017-06-01 01:13:00  12.0  10.0
2017-06-01 02:09:00  32.0  24.0
2017-06-26 22:47:00   8.0  14.0

But since the whole operation is in numpy, you will lose the labels.