Scikit Learn - Identifying target from loading a CSV

Question

I'm loading a csv, using Numpy, as a dataset to create a decision tree model in Python. using the below extract places columns 0-7 in X and the last column as the target in Y.

#load and set data
data = np.loadtxt("data/tmp.csv", delimiter=",")
X = data[:,0:7] #identify columns as data sets
Y = data[:,8] #identfy last column as target

#create model
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

What i'd like to know is if its possible to have the classifier in any column. for example if its in the fourth column would the following code still fit the model correctly or would it produce errors when it comes to predicting?

#load and set data
data = np.loadtxt("data/tmp.csv", delimiter=",")
X = data[:,0:8] #identify columns as data sets
Y = data[:,3] #identfy fourth column as target

#create model
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

Ahmed Fasih · Accepted Answer

If you have >4 columns, and the 4th one is the target and the others are features, here's one way (out of many) to load them:

# load data

X = np.hstack([data[:, :3], data[:, 5:]]) # features
Y = data[:,4] # target

# process X & Y

(with belated thanks to @omerbp for reminding me hstack takes a tuple/list, not naked arguments!)

Scikit Learn - Identifying target from loading a CSV

Tags:

python

csv

numpy

classification

scikit-learn

user2249567

1 Answers

Ahmed Fasih

Recent Activity

Donate For Us

Scikit Learn - Identifying target from loading a CSV

Tags:

python

csv

numpy

classification

scikit-learn

user2249567

1 Answers

Ahmed Fasih

Related questions

Recent Activity

Donate For Us