Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom sklearn Regressor: Cannot clone object... as the constructor does not seem to set parameter

I'm trying to implement my own kernel regression compatible with sklearn library. My implementation is the following:

import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin, TransformerMixin, RegressorMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.metrics import euclidean_distances
import models.kernel as ker
        
        
class MyKerReg(BaseEstimator, RegressorMixin):
    
    def __init__ (self, kernel = "gaussian", bandwidth = 1.0):
        self.kernel = ker.kernel(kernel)
        self.bandwidth = bandwidth
  
        
    def fit(self, X, y):
        
        X, y = check_X_y(X, y, accept_sparse=True, ensure_2d=False)
        self.is_fitted_ = True
        self.X_ = X
        self.y_ = y
        
        return self
        
    def predict(self, X):
        
        X = check_array(X, accept_sparse=True, ensure_2d=False)
        check_is_fitted(self, 'is_fitted_')
        
        pred = []
        for x in X:
            tmp = [x - v for v in self.X_]
            ker_values = [(1/self.bandwidth)*self.kernel(v/self.bandwidth) for v in tmp]
            
            ker_values = np.array(ker_values)
            values = np.array(self.y_)
            
            num = np.dot(ker_values.T, values)
            denom = np.sum(ker_values)
            
            pred.append(num/denom)
        return pred

When I call the function predict stand alone all is working well. When is used this object in the cross_val_score like this ...


    y, x = misc.data_generating_process(1000)
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 44)
    
    kr = ker_reg.MyKerReg(kernel = "gaussian", bandwidth = 0.5)
    
    print(cross_val_score(kr, x_train, y_train, scoring="neg_mean_squared_error", cv=5))

... i get the following error:

Exception has occurred: RuntimeError
Cannot clone object MyKerReg(bandwidth=0.5, kernel=<models.kernel.kernel object at 0x7fab359bc940>), as the constructor either does not set or modifies parameter kernel

During handling of the above exception, another exception occurred:

  File "/home/dragos/Projects/ML_Homework/kernel_regression/main.py", line 24, in main
    print(cross_val_score(kr, x_train, y_train, scoring="neg_mean_squared_error", cv=5))
  File "/home/dragos/Projects/ML_Homework/kernel_regression/main.py", line 85, in <module>
    main()

Anyone has any idea on how to fix this? I know there is a similar tread on this topic I can't still figure it out. Thank you all.

I've already read the documentation and articles on the topic and It seems like I'm doing everything right.

like image 685
Dragos Tanasa Avatar asked Nov 16 '25 21:11

Dragos Tanasa


1 Answers

The __init__ method should set its parameters as attributes, with no name changes or validation. In your example, self.kernel = ker.kernel(kernel) is to blame. You can probably move that into the beginning of fit instead: leave just self.kernel = kernel in init, and self.kernel_ = ker.kernel(self.kernel) in fit.

From the developer guide:

every keyword argument accepted by __init__ should correspond to an attribute on the instance. Scikit-learn relies on this to find the relevant attributes to set on an estimator when doing model selection.

[...]

There should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in fit.

like image 81
Ben Reiniger Avatar answered Nov 18 '25 10:11

Ben Reiniger



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!