Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get dictionary of dtypes as read in by Pandas read_csv()

I have a large Pandas dataframe that I've imported from a SQL database. The whole process takes several hours. As I work on the data, inevitably the dataframe gets altered and I regularly want to go back to 'a known good dataset' and re-run various functions. Instead of importing the data from the database, I want to save the data at various points of the analysis process as CSV files which can then be used to restore the data as required; CSV is the format of choice because, for some reason, I haven't had much luck with pickling the dataframes. Simply importing the CSV data using pd.read_csv() alters the datatypes of the columns. As a result, I want to create a dictionary of dtypes which can be used to restore the data types when importing the CSV back into a dataframe.

As an example, a simple dataframe may be defined as shown below:

df = pd.DataFrame({'A':[1,2,3,4,5],'B':['a','b','c','d','e'],'C':[1.2,3.4,5.6,7.8,9.0]},index=[0,2,4,6,8])

which looks like:

   A  B    C
0  1  a  1.2
2  2  b  3.4
4  3  c  5.6
6  4  d  7.8
8  5  e  9.0

A dictionary of dtypes can be created using:

dtypesDict = df.dtypes.to_dict()

which produces:

{'B': dtype('O'), 'C': dtype('float64'), 'A': dtype('int64')}

If I try to use this output to define a dictionary in hard-code so it can be used to set the data types for the columns which are imported using pd.read_csv(), it fails as follows:

dtypesDict = {'B': dtype('O'), 'C': dtype('float64'), 'A': dtype('int64')}

NameError: name 'dtype' is not defined

However, defining the dictionary as:

dtypesDict = {'B': 'O', 'C': 'float64', 'A': 'int64'}

allows the CSV file to be imported with no problems.

I thought a dictionary comprehension was the way to go but I can't make that work:

dtypesDict = {k:bit_in_brackets_of_v for k,v in df.dtypes.to_dict().items()}

How can I automatically produce a dictionary of the correct format that can be defined in hard-code using cut a simple cut-and-paste process and will allow dtypes of columns imported from a CSV to be set correctly?

like image 871
user1718097 Avatar asked Oct 15 '25 14:10

user1718097


1 Answers

You can do dict with zip , to get the dtype name using dtype.name

dict(zip(list(df),[df[x].dtype.name for x in df]))

Out[6]: {'A': 'int64', 'B': 'object', 'C': 'float64'}
like image 52
BENY Avatar answered Oct 18 '25 09:10

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!