How to add cluster label columns back into original dataframe- python, for supervised learning

Question

I have a column in my data frame which contains Url information. It has 1200+ unique values. I wanted to use text mining to generate features from these values. I have used tfidfvectorizer to generate vectors and then used kmeans to identify clusters. I now want to assign these cluster labels back into my original dataframe, so that I can bin the URL information into these clusters.

Below code to generate vectors and cluster labels

from scipy.spatial.distance import cdist


vectorizer = TfidfVectorizer(min_df = 1,lowercase = False, ngram_range = (1,1), use_idf = True, stop_words='english')
X = vectorizer.fit_transform(sample$$'lead_lead_source_modified'$$)
X = X.toarray()
distortions=
K = range(1,10)
for k in K:
    kmeanModel = KMeans(n_clusters=k).fit(X)
    kmeanModel.fit(X)
    distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_, 'euclidean'), axis=1)) / X.shape$$0$$)

#append cluster labels

km = KMeans(n_clusters=4, random_state=0)
km.fit_transform(X)
cluster_labels = km.labels_
cluster_labels = pd.DataFrame(cluster_labels, columns=$$'ClusterLabel_lead_lead_source'$$)
cluster_labels

Through the elbow method, I decided on 4 clusters. I now have cluster labels, but I am not sure how to add them bank to dataframe on its respective index. Concatenating along axis=1 is creating Nans due to indexing issues. Below is the sample output after concatenation.

    lead_lead_source_modified   ClusterLabel_lead_lead_source
0   NaN                          3.0
1   NaN                          0.0
2   NaN                          0.0
3   ['direct', 'salesline', 'website', '']  0.0

I want to know if this approach is the right way to do, if so then how to solve this issue. If not, is there a better way to do.

SSuram · Accepted Answer

Adding index value during dataframe conversion solved the issue.

But it still want to know if this is the right approach

How to add cluster label columns back into original dataframe- python, for supervised learning

Tags:

machine-learning

cluster-analysis

k-means

supervised-learning

data-science

SSuram

1 Answers

SSuram

Recent Activity

Donate For Us

How to add cluster label columns back into original dataframe- python, for supervised learning

Tags:

machine-learning

cluster-analysis

k-means

supervised-learning

data-science

SSuram

1 Answers

SSuram

Related questions

Recent Activity

Donate For Us