How to classify observations based on their covariates in dataframe and numpy?

Question

I have a dataset with n observations and say 2 variables X1 and X2. I am trying to classify each observation based on a set of conditions on their (X1, X2) values. For example, the dataset looks like

df:
Index     X1    X2
1         0.2   0.8
2         0.6   0.2
3         0.2   0.1
4         0.9   0.3

and the groups are defined by

Group 1: X1<0.5 & X2>=0.5
Group 2: X1>=0.5 & X2>=0.5
Group 3: X1<0.5 & X2<0.5
Group 4: X1>=0.5 & X2<0.5

I'd like to generate the following dataframe.

expected result:
Index     X1    X2    Group
1         0.2   0.8   1
2         0.6   0.2   4
3         0.2   0.1   3
4         0.9   0.3   4

Also, would it be better/faster to work with numpy arrays for this type of problems?

sacuL · Accepted Answer

In answer to your last question, I definitely think pandas is a good tool for this; it could be done in numpy, but pandas is arguably more intuitive when working with dataframes, and fast enough for most applications. pandas and numpy also play really nicely together. For instance, in your case, you can use numpy.select to build your pandas column:

import numpy as np
import pandas as pd
# Lay out your conditions
conditions =  [((df.X1 < 0.5) & (df.X2>=0.5)),
               ((df.X1>=0.5) & (df.X2>=0.5)),
               ((df.X1<0.5) & (df.X2<0.5)),
               ((df.X1>=0.5) & (df.X2<0.5))]

# Name the resulting groups (in the same order as the conditions)
choicelist = [1,2,3,4]

df['group']= np.select(conditions, choicelist, default=-1)

# Above, I've the default to -1, but change as you see fit
# if none of your conditions are met, then it that row would be classified as -1

>>> df
   Index   X1   X2  group
0      1  0.2  0.8      1
1      2  0.6  0.2      4
2      3  0.2  0.1      3
3      4  0.9  0.3      4

How to classify observations based on their covariates in dataframe and numpy?

Tags:

python

pandas

dataframe

numpy

user9431976

1 Answers

sacuL

Recent Activity

Donate For Us

How to classify observations based on their covariates in dataframe and numpy?

Tags:

python

pandas

dataframe

numpy

user9431976

1 Answers

sacuL

Related questions

Recent Activity

Donate For Us