Don't drop unique value with dropna() pandas

Question

what's up?

I am having a little problem, where I need to use the pandas dropna function to remove rows from my dataframe. However, I need it to not delete the unique values from my dataframe.

Let me explain better. I have the following dataframe:

id	birthday
0102-2	09/03/2020
0103-2	14/03/2020
0104-2	NaN
0105-2	NaN
0105-2	25/03/2020
0108-2	07/04/2020

In the case above, I need to delete the row from my dataframe based on the NaN values in the birthday column. However, as you can see the id "0104-2" is unique unlike the id "0105-2" where it has a NaN value and another with a date. So I would like to keep track of all the lines that have NaN that are unique.

Is it feasible to do this with dropna, or would I have to pre-process the information beforehand?

ScottC · Accepted Answer

You could sort by the birthday column and then drop duplicates keeping the first out of the two, by doing the following:

The complete code would look like this:

import pandas as pd
import numpy as np

data = {
    "id": ['102-2','103-2','104-2', '105-2', '105-2', '108-2'],
    "birthday":['09/03/2020', '14/03/2020', np.nan, np.nan, '25/03/2020', '07/04/2020']
}

df = pd.DataFrame(data)
df.sort_values(['birthday'], inplace=True)
df.drop_duplicates(subset="id", keep='first', inplace=True)
df.sort_values(['id'], inplace=True)

enter image description here

CODE EXPLANATION: Here is the original dataframe:

import pandas as pd
import numpy as np

data = {
    "id": ['102-2','103-2','104-2', '105-2', '105-2', '108-2'],
    "birthday":['09/03/2020', '14/03/2020', np.nan, np.nan, '25/03/2020', '07/04/2020']
}

df = pd.DataFrame(data)

enter image description here

Now sort the dataframe:

df.sort_values(['birthday'], inplace=True)

enter image description here

Then drop the duplicates based on the id column. Keeping only the first value.

df.drop_duplicates(subset="id", keep='first', inplace=True)

enter image description here

Don't drop unique value with dropna() pandas

Tags:

python

pandas

dataframe

Douglas F

1 Answers

ScottC

Recent Activity

Donate For Us

Don't drop unique value with dropna() pandas

Tags:

python

pandas

dataframe

Douglas F

1 Answers

ScottC

Related questions

Recent Activity

Donate For Us