Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if my data is one-hot encoded

If I have a data matrix, how do I check if the categorical variables have been one-hot encoded or not? I need to use LIME to explain my prediction, and I read that LIME works only if you have category labels instead of one-hot encoded columns. I found code to convert it, but it works only if it has been encoded otherwise the columns get turned to NaNs.

So I need e piece of code that looks at a numpy array with data and tells me if it has been one hot encoded or not.

like image 460
vishak bharadwaj Avatar asked Dec 07 '25 03:12

vishak bharadwaj


1 Answers

You can sum all the rows, and see if you get a all 1's array, as in the following example:

Example:

X = np.array(
    [
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [0, 1, 0],
        [1, 0, 0]
    ]
)
print(f'X is one-hot-encoded: {(X.sum(axis=1)-np.ones(X.shape[0])).sum()==0}')

Result:

X is one-hot-encoded: True
like image 149
Michael Avatar answered Dec 09 '25 20:12

Michael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!