Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SKLearn ValueError: setting an array element with a sequence

As part of a project, I am trying to use the random forest classifier from Python's SKLearn library. I have been using this tutorial as a guide: https://chrisalbon.com/machine_learning/trees_and_forests/random_forest_classifier_example/.

My code follows this tutorial line by line, but the only major difference is the structure of the data. In the tutorial, there are 4 features (4 columns in the data table), and each entry in a column is a number. In my code, I have 1 feature (1 column in the data table), and each entry in a column is a numpy array. When I call the fit() function, I get the following error: ValueError: setting an array element with a sequence.

Here is my code:

import pandas as pd
import numpy as np
import random
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix

trainingData = [[[0, 0, 3], 0.77], [[24, 0, 5], 30], [[0, 0, 4], 0.77], [[0, 0, 0], 0.77]]
vectors_train = []
for i in range (0, len(trainingData)):
    vectors_train.append(trainingData[i][0])

testingData = [[[1, 0, 0], 0.77], [[30, 0, 5], 30], [[0, 0, 0], 0.77], [[0, 0, 0], 0.77]]
vectors_test = []
for i in range (0, len(testingData)):
    vectors_test.append(testingData[i][0])

dataframe_training = pd.DataFrame(trainingData)
dataframe_training['is_train'] = True
dataframe_testing = pd.DataFrame(testingData)
dataframe_testing['is_train'] = False
frames = [dataframe_training, dataframe_testing]
dataframe = pd.concat(frames)
dataframe.rename(index = str, columns = {0: 'Vector', 1: 'Label', 2: 'is_train'})

train, test = dataframe[dataframe['is_train']==True], dataframe[dataframe['is_train']==False]
features = dataframe.columns[:1]
labels_train, uniques = pd.factorize(train[1], sort = True)
clf = RandomForestClassifier()

clf.fit(train[features], labels)              # Value error occurs here

I am confused by what the error actually means. What array element is being set to a sequence, and where is this sequence? I'm also aware thattrain[features] is a DataFrame object, and that the fit() function takes in two parameters, both of which must be array-like. labels is an array, and the error specifically points to the first parameter being the problem, so is there a data type conversion I have to do?

When I replace the line clf.fit(train[features], labels) with clf.fit(vectors_train, labels), the error goes away. However, I want to know why it is not working when I use the same strategy as the tutorial and how to get it to work in a similar fashion.

Any help would be much appreciated. Thanks!

like image 778
pumpkin39 Avatar asked Sep 07 '25 01:09

pumpkin39


1 Answers

Remove the features variable and make the last line:

clf.fit(train[0].tolist(), labels)

No error raised with the code above.

Your code isn't working because columns as you do column[:1] returns a sequence with one column, however column[0] won't, and if you feed that int to cls.fit doing train[features] with the columns[0] as features, it still won't work since it requires a list or array, so train[features].tolist() will also work.

like image 178
U12-Forward Avatar answered Sep 09 '25 04:09

U12-Forward