Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering out rows based on other rows using pandas

I have a dataframe that looks like this:

dict = {'companyId': {0: 198236, 1: 198236, 2: 900814, 3: 153421, 4: 153421, 5: 337815},
 'region': {0: 'Europe', 1: 'Europe', 2: 'Asia-Pacific', 3: 'North America', 4: 'North America', 5:'Africa'},
 'value': {0: 560, 1: 771, 2: 964, 3: 217, 4: 433, 5: 680},
 'type': {0: 'actual', 1: 'forecast', 2: 'actual', 3: 'forecast', 4: 'actual', 5: 'forecast'}}

df = pd.DataFrame(dict)

    companyId     region          value  type
0   198236        Europe          560    actual
1   198236        Europe          771    forecast
2   900814        Asia-Pacific    964    actual
3   153421        North America   217    forecast
4   153421        North America   433    actual
5   337815        Africa          680    forecast

I can't seem to figure out a way to filter out certain rows based on the following condition:

If there are two entries under the same companyId, as is the case for 198236 and 153421, I want to keep only the entry where type is actual.

If there is only one entry under a companyId, as is the case for 337815 and 900814, I want to keep that row, irrespective of the value in column type.

Does anyone have an idea how to go about this?

like image 954
A.N. Avatar asked Nov 21 '25 15:11

A.N.


2 Answers

You can check with argsort then drop_duplicates

out = df.iloc[df.type.ne('actual').argsort()].drop_duplicates('companyId')
Out[925]: 
   companyId         region  value      type
0     198236         Europe    560    actual
2     900814   Asia-Pacific    964    actual
4     153421  North America    433    actual
5     337815         Africa    680  forecast
like image 184
BENY Avatar answered Nov 24 '25 05:11

BENY


You can use a groupby and transform to create boolean indexing:

#Your condition i.e. retain the rows which are not duplicated and those
# which are duplicated but only type==actual. Lets express that as a lambda.
to_filter = lambda x: (len(x) == 1) | ((len(x) > 1) & (x == 'actual'))

#then create a boolean indexing mask as below
m = df.groupby('companyId')['type'].transform(to_filter)


#then filter your df with that m:
df[m]:

   companyId         region  value      type
0     198236         Europe    560    actual
2     900814   Asia-Pacific    964    actual
4     153421  North America    433    actual
5     337815         Africa    680  forecast
like image 23
SomeDude Avatar answered Nov 24 '25 05:11

SomeDude



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!