Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Pandas.DataFrame.iloc convert to numpy.float64 and round?

Take this number as an example:

1.64847910404205

If I create a Pandas DataFrame with a row and this value:

df = pd.DataFrame([{'id': 77, 'data': 1.64847910404205}])

and then iterate over the rows (Okay... the 'row') and inspect:

for index, row in df.iterrows():
    if index > 0:
        previous_row = df.iloc[index]

Of course the above is weird: why would I iterate over the rows just to pull the same row from the DF? Forget that; I removed the -1 to illustrate.

Now, if I use SciView (part of IntelliJ) and the data tab to inspect the rows individually, I get this:

row
data: 1.64847910404205

previous_row
data: 1.64847910404

Notice that previous_row has been rounded. It's because they are for some reason different data types...

row: 
type(row) #float64

previous_row:
type(previous_row) #numpy.float64

I'm curious to know: why does iloc convert to a numpy.float64 and how can I prevent it from doing so?

I need the same level of precision as I will later be doing Peak Signal to Noise Ratio (PSNR) calculations. Of course, I could just convert the float to a numpy.float64, but I don't want to lose precision.

like image 222
pookie Avatar asked Nov 01 '25 04:11

pookie


1 Answers

The type of the 'data' column in your dataframe is numpy.float64, even if Pandas only reports it as float64. You can prove this to yourself with the following:

df['data'].dtype.type is numpy.float64

which will return True. An alternative form would be:

type(df['data'].values[0]) is numpy.float64

which will also return True.

Any difference in display is down to how SciView is interpreting your code.

like image 77
tel Avatar answered Nov 02 '25 17:11

tel