I have an excel file which I'm importing as a pandas dataframe.
My dataframe df
:
id name value
1 abc 22.3
2 asd 11.9
3 asw 2.4
I have a dictionary d
in format:
{ 'name' : 'str',
'value' : 'float64',
'id' : 'int64'}
I want to check whether the data types of the columns in my dataframe is the same as defined in the dictionary.
Output can be just a string like, if all the columns have their respective data type,
print("Success")
else:
print(" column id has different data type.Please check your file)"
Call dtypes
, convert to a dictionary and compare.
d1 = df.dtypes.astype(str).to_dict()
d1
{'id': 'int64', 'name': 'object', 'value': 'float64'}
d1 == {'name' : 'str', 'value' : 'float64', 'id' : 'int64'}
False
Unfortunately, name
is shown to be an object
column, not str
, hence the False
. I could recommend doing a quick iteration over your dict and changing all entries where str
appears to object
(this shouldn't hurt):
d2 = {k : 'object' if v == 'str' else v for k, v in d2.items()}
d2
{'id': 'int64', 'name': 'object', 'value': 'float64'}
d1 == d2
True
To check which column(s) are incorrect, the solution becomes a little more involved, but is still quite easy with a list comprehension.
[k for k in d1 if d1[k] != d2.get(k)]
['name']
Use
In [5759]: s = df.dtypes == pd.Series(d)
In [5760]: ss = s[~s]
In [5761]: if ss.empty:
...: print('sucess')
...: else:
...: print ('columns %s have different data type' % ss.index.tolist())
...:
...:
columns ['name'] have different data type
Details
In [5763]: df
Out[5763]:
id name value
0 1 abc 22.3
1 2 asd 11.9
2 3 asw 2.4
In [5764]: d
Out[5764]: {'id': 'int64', 'name': 'str', 'value': 'float64'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With