Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove 0's converting pandas dataframe to record

Tags:

I am looking for an efficient way to remove zeros from a list of dictionaries created from a pd.DataFrame Take the following example:

df = pd.DataFrame([[1, 2], [0, 4]], columns=['a', 'b'], index=['x', 'y'])
df.to_dict('records')

[{'a': 1, 'b': 2}, {'a': 0, 'b': 4}]

What I would like is:

[{'a': 1, 'b': 2}, {'b': 4}]

I have a very large sparse dataframe, storing all of the zeros is inefficient. Because the dataframe is large I am looking for a faster solution than looping through the data frame of dictionaries and removing zeros, for instance the following works but is very slow and uses large amounts of memory.

new_records = []
for record in df.to_dict('records'):
    new_records.append(dict((k, v) for k, v in record.items() if v))

Is there a more efficient method or approach to this?

like image 893
johnchase Avatar asked Dec 15 '16 00:12

johnchase


1 Answers

use a list comprehension

[r[r != 0].to_dict() for _, r in df.iterrows()]

[{'a': 1, 'b': 2}, {'b': 4}]
like image 187
piRSquared Avatar answered Sep 26 '22 16:09

piRSquared