Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to convert npz files to panda dataframe?

Tags:

python

pandas

I have a large npz file that l've loaded with numpy's np.load. I want to convert this to panda's dataframe so l can apply machine learning algorithms (KNN, K-Means, DT) using scikit-learn. I am new to python so my experience is very limited to this library. Thank you for the help.

This is what l have so far:

dataset = np.load('./example.npz')

test_data = dataset['data']

test_labels = dataset['labels']

print data.shape gives (17000, 78400)

print labels.shape gives (17000, 1)

like image 993
Artificial_Spark Avatar asked Nov 29 '25 18:11

Artificial_Spark


1 Answers

I'm not sure how you want to structure your dataframe, but this will load the npz file with the labels as index:

import pandas as pd
import numpy as np

npz = np.load('/path/to/npz.npz')
df= pd.DataFrame.from_dict({item: npz[item] for item in npz.files}, orient='index')

if you want to load the arrays into a single column use:

pd.DataFrame.from_dict({item: [npz[item]] for item in npz.files}, orient='index')

Just drop the orient='index' if you want to load the labels as columns.

like image 99
RJ Adriaansen Avatar answered Dec 02 '25 07:12

RJ Adriaansen