Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete nan/null values in lists in a list in Python?

Tags:

python

pandas

So I have a dataframe with NaN values and I tranfsform all the rows in that dataframe in a list which then is added to another list.

Index   1   2   3   4   5   6   7   8   9   10  ... 71  72  73  74  75  76  77  78  79  80
orderid                                                                                 
20000765    624380  nan nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan nan nan
20000766    624380  nan nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan nan nan
20000768    1305984 1305985 1305983 1306021 nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan nan nan
records = []
for i in range(0, 60550):
    records.append([str(dfpivot.values[i,j]) for j in range(0, 10)])

However, a lot of rows contain NaN values which I want to delete from the list, before I put it in the list of lists. Where do I need to insert that code and how do I do this?

I thought that this code would do the trick, but I guess it looks only to the direct values in the 'list of lists':

records = [x for x in records if str(x) != 'nan']

I'm new to Python, so I'm still figuring out the basics.

like image 365
Tim Hellegers Avatar asked Oct 16 '25 20:10

Tim Hellegers


2 Answers

One way is to take advantage of the fact that stack removes NaNs to generate the nested list:

df.stack().groupby(level=0).apply(list).values.tolist()
# [[624380.0], [624380.0], [1305984.0, 1305985.0, 1305983.0, 1306021.0]]
like image 165
yatu Avatar answered Oct 18 '25 10:10

yatu


IF you want to keep rows with nans you can do it like this:

In [5457]: df.T.dropna(how='all').T                                                                                                                                                            
Out[5457]: 
         Index           1           2           3           4
0 20000765.000  624380.000         nan         nan         nan
1 20000766.000  624380.000         nan         nan         nan
2 20000768.000 1305984.000 1305985.000 1305983.000 1306021.000

if you don't want any columns with nans you can drop them like this:

In [5458]: df.T.dropna().T                                                                                                                                                                     
Out[5458]: 
         Index           1
0 20000765.000  624380.000
1 20000766.000  624380.000
2 20000768.000 1305984.000

To create the array:

In [5464]: df.T.apply(lambda x: x.dropna().tolist()).tolist()                                                                                                                                  
Out[5464]: 
[[20000765.0, 624380.0],
 [20000766.0, 624380.0],
 [20000768.0, 1305984.0, 1305985.0, 1305983.0, 1306021.0]]

or

df.T[1:].apply(lambda x: x.dropna().tolist()).tolist()                                                                                                                              

Out[5471]: [[624380.0], [624380.0], [1305984.0, 1305985.0, 1305983.0, 1306021.0]]

depending on how you want the array

like image 40
oppressionslayer Avatar answered Oct 18 '25 10:10

oppressionslayer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!