Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using 'apply' with arrays inside Pandas dataframe elements

I am working with pandas dataframes which contain arrays inside the dataframe elements. I'm trying to "apply" a function to these elements, and then return an array. But I have some very inconsistent behavior. The function runs okay the first few times, but then it fails. Here is my code:

import pandas as pd
import numpy as np

def g(x):  # Function fails if I omit the .tolist()
    return (np.concatenate([x['B'][1:], x['C'][1:]])).tolist()

df = pd.DataFrame({'A' : (1,2,3), \
                   'B': (np.array([0,1,2,3]),np.array([3,4,5,6]),np.array([6,7,8,9])), \
                   'C': (np.array([0,1,2,3]),np.array([2,9,6,9]),np.array([2,4,6,7]))})
# Before we start
print(df)
print("B is type:  ", type(df.loc[0,'B']))
# First time 
df['G'] = df.apply(g, axis=1)
print("G is type:  ", type(df.loc[0,'G']))
# Second time
df['H'] = df.apply(g, axis=1)
print("H is type:  ", type(df.loc[0,'H']))
# Third time 
df['I'] = df.apply(g, axis=1)
print("I is type:  ", type(df.loc[0,'I']))
# Fourth time - this one fails for me
df['J'] = df.apply(g, axis=1)
print("J is type:  ", type(df.loc[0,'J']))
# Fifth time 
df['K'] = df.apply(g, axis=1)
print("K is type:  ", type(df.loc[0,'K']))

The code runs fine for me, up to the line df['J'], where it fails. The output is like this:

   A             B             C
0  1  [0, 1, 2, 3]  [0, 1, 2, 3]
1  2  [3, 4, 5, 6]  [2, 9, 6, 9]
2  3  [6, 7, 8, 9]  [2, 4, 6, 7]
B is type:   <class 'numpy.ndarray'>
G is type:   <class 'list'>
H is type:   <class 'list'>
I is type:   <class 'list'>

Then there is a big long error message which finishes with "ValueError: Wrong number of items passed 6, placement implies 1", and there is also a "KeyError: 'J'" in there too.

The crazy thing is that the function runs fine the first few times. My questions are:

  • Why does my code fail when it gets to df['J']?
  • How can I get g(x) to return a numpy array rather than a list? If I leave out the .tolist() it gives me an error.
  • Is there an easier way to work with arrays inside dataframe elements?

Any help would be hugely appreciated! I've spent 2 days trying to understand what is going on here.

P.S. I haven't explained why I am using arrays inside dataframe elements, but I can explain if you think it would help.

like image 539
Michael Avatar asked Oct 21 '25 03:10

Michael


1 Answers

Between the different times you apply g function, your dataframe changes, then it is not really a surprise that the reaction of pandas won't be the same everytime. If you only need to apply it to the columns B and C, i suggest you type:

df['J'] = df[['B','C']].apply(g, axis=1)
print("J is type:  ", type(df.loc[0,'J']))

This way it works fine (but once again it only take the columns Band C into account).

As for the error, According to Ians it's because as soon as the output of the apply has more than 6 columns, it turns into a DataFrame instead of a Series. Then it can't be set to df['J'].

like image 198
ysearka Avatar answered Oct 23 '25 15:10

ysearka



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!