I am trying to load data into a polars DataFrame using the read_csv
command but I keep getting this error
RuntimeError: Any(ComputeError("Could not parse 0.5 as dtype Int64 at column 13.\n The total offset in the file is 11684833 bytes.\n\n Consider running the parser `with_ignore_parser_errors=true`\n or consider adding 0.5 to the `null_values` list."))
While I used the converters argument as follows:
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x)
}
The error still persists. I also tried to use the argument displayed in the error:
ignore_errors=True
The error is still there. What can I do? My issue is not with parsing dates, but rather with parsing numbers. This is what I have for now:
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x)
}
df_file = pl.read_csv(file_to_read, has_headers=True, converters=converters, ignore_errors=True)
Polars doesn't have a converters
argument. So that won't work.
It seems that a floating point column is trying to be parsed as integers. You can manually set the dtype to pl.Float64
by passing the column name in schema_overrides
:
pl.read_csv(..., schema_overrides = {"foo": pl.Float64})
Or you can increase the infer_schema_length
so that polars automatically detects floats (the first 100 rows probably only contain integers).
The default is 100
, try increasing it until schema inference correctly detects the floating point column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With