Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Holding the same binning in the train and test vector data sets

Tags:

julia

I have a numeric vector train that I use in a training data set for a model. Assume I want to cut it into 5 bins. I know I can do it using cut(x, 5) from CategoricalArrays.jl. How to set the same binning in the test vector from a test data set of the model?

like image 902
Daniel Kaszyński Avatar asked Oct 28 '25 04:10

Daniel Kaszyński


1 Answers

Perhaps there is a better solution but this would work:

using CategoricalArrays, Statistics

nbins = 5
breaks = Statistics.quantile(train, (1:nbins-1)/nbins)

cat_train = cut(train, breaks;extend=true,labels=string.("BIN_",1:5))

cat_test =  cut(test, breaks;extend=true,labels=string.("BIN_",1:5))
like image 137
Przemyslaw Szufel Avatar answered Oct 30 '25 07:10

Przemyslaw Szufel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!