Python machine learning, feature selection

Question

I am working on a classification task related to written text and I wonder how important it is to perform some kind of "feature selection" procedure in order to improve the classification results.

I am using a number of features (around 40) related to the subject, but I am not sure if all the features are really relevant or not and in which combinations. I am experementing with SVM (scikits) and LDAC (mlpy).

If a have a mix of relevant and irrelevant features, I assume I will get poor classification results. Should I perform a "feature selection procedure" before classification?

Scikits has an RFE procedure that is tree-based that is able to rank the features. Is it meaningful to rank the features with a tree-based RFE to choose the most important features and to perform the actual classification with SVM (non linear) or LDAC? Or should I implement some kind of wrapper method using the same classifier to rank the features (trying to classify with different groups of features would be very time consuming)?

ogrisel · Accepted Answer

Just try an see if it improves the classification score as measured with cross validation. Also before trying RFE, I would try less CPU intensive schemes such as univariate chi2 feature selection.

Python machine learning, feature selection

Tags:

python

scikit-learn

scikits

andreSmol

1 Answers

ogrisel

Recent Activity

Donate For Us

Python machine learning, feature selection

Tags:

python

scikit-learn

scikits

andreSmol

1 Answers

ogrisel

Related questions

Recent Activity

Donate For Us