Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing errors when reading CSV files using Polars

I am trying to load data into a polars DataFrame using the read_csv command but I keep getting this error

RuntimeError: Any(ComputeError("Could not parse 0.5 as dtype Int64 at column 13.\n                                            The total offset in the file is 11684833 bytes.\n\n                                            Consider running the parser `with_ignore_parser_errors=true`\n                                            or consider adding 0.5 to the `null_values` list."))

While I used the converters argument as follows:

converters = {
    'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
    'Number': lambda x: float(x)
}

The error still persists. I also tried to use the argument displayed in the error:

ignore_errors=True

The error is still there. What can I do? My issue is not with parsing dates, but rather with parsing numbers. This is what I have for now:

converters = {
   'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
   'Number': lambda x: float(x)
}

df_file = pl.read_csv(file_to_read, has_headers=True, converters=converters, ignore_errors=True)
like image 940
Rayen Avatar asked Oct 17 '25 17:10

Rayen


1 Answers

Polars doesn't have a converters argument. So that won't work.

It seems that a floating point column is trying to be parsed as integers. You can manually set the dtype to pl.Float64 by passing the column name in schema_overrides:

pl.read_csv(..., schema_overrides = {"foo": pl.Float64})

Or you can increase the infer_schema_length so that polars automatically detects floats (the first 100 rows probably only contain integers).

The default is 100, try increasing it until schema inference correctly detects the floating point column.

like image 164
ritchie46 Avatar answered Oct 20 '25 07:10

ritchie46



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!