I have a large npz file that l've loaded with numpy's np.load. I want to convert this to panda's dataframe so l can apply machine learning algorithms (KNN, K-Means, DT) using scikit-learn. I am new to python so my experience is very limited to this library. Thank you for the help.
This is what l have so far:
dataset = np.load('./example.npz')
test_data = dataset['data']
test_labels = dataset['labels']
print data.shape gives (17000, 78400)
print labels.shape gives (17000, 1)
I'm not sure how you want to structure your dataframe, but this will load the npz file with the labels as index:
import pandas as pd
import numpy as np
npz = np.load('/path/to/npz.npz')
df= pd.DataFrame.from_dict({item: npz[item] for item in npz.files}, orient='index')
if you want to load the arrays into a single column use:
pd.DataFrame.from_dict({item: [npz[item]] for item in npz.files}, orient='index')
Just drop the orient='index' if you want to load the labels as columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With