sklearn selectKbest: which variables were chosen?

Tags:

I'm trying to get sklearn to select the best k variables (for example k=1) for a linear regression. This works and I can get the R-squared, but it doesn't tell me which variables were the best. How can I find that out?

I have code of the following form (real variable list is much longer):

X=[]
for i in range(len(df)):
X.append([averageindegree[i],indeg3_sum[i],indeg5_sum[i],indeg10_sum[i])


training=[]
actual=[]
counter=0
for fold in range(500):
    X_train, X_test, y_train, y_test = crossval.train_test_split(X, y, test_size=0.3)
    clf = LinearRegression()
    #clf = RidgeCV()
    #clf = LogisticRegression()
    #clf=ElasticNetCV()

    b = fs.SelectKBest(fs.f_regression, k=1) #k is number of features.
    b.fit(X_train, y_train)
    #print b.get_params

    X_train = X_train[:, b.get_support()]
    X_test = X_test[:, b.get_support()]


    clf.fit(X_train,y_train)
    sc = clf.score(X_train, y_train)
    training.append(sc)
    #print "The training R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%"
    sc = clf.score(X_test, y_test)
    actual.append(sc)
    #print "The actual R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%"

822

asked Jan 31 '14 02:01

Alexis Eggermont

1 Answers

You need to use get_support:

features_columns = [.......]
fs = SelectKBest(score_func=f_regression, k=5)
print zip(fs.get_support(),features_columns)

102

answered Oct 03 '22 20:10

Hamid K

Related questions
                            
                                Evaluate inner product of bra and ket in Sympy Quantum
                            
                                Setting Django admin display times to local time?
                            
                                Fatal Python error: Cannot recover from stack overflow
                            
                                Enable PK based filtering in Django Graphene Relay while retaining Global IDs
                            
                                Keras fit_generator() - How does batch for time series work?
                            
                                ERROR: Command errored out with exit status 1 while installing requirements
                            
                                Slicing behavior of python range()[:]
                            
                                Django: running manage.py always aborts
                            
                                Is there Python Clang wrapper in the vein of pygccxml which wraps GCC-XML?
                            
                                A Viable Solution for Word Splitting Khmer?
                            
                                How do I tell django-nose where my tests are?
                            
                                Simple license protection for Python app
                            
                                Anomaly detection using Python [closed]
                            
                                sys.setswitchinterval in Python 3.2 and beyond
                            
                                Semantic Search in Python for hobbies + latest news
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character [...]
                            
                                C Python: Running Python code within a context
                            
                                Python Popen().stdout.read() hang
                            
                                Streaming media files via DLNA/UPnP
                            
                                OpenCV and Numpy interacting badly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sklearn selectKbest: which variables were chosen?

Tags:

python

scikit-learn

Alexis Eggermont

People also ask

1 Answers

Hamid K

Recent Activity

Donate For Us