I have some duration type data (lap times) as pl.String
that fails to convert using strptime
, whereas regular datetimes work as expected.
Minutes (before :) and Seconds (before .) are always padded to two digits, Milliseconds are always 3 digits.
Lap times are always < 2 min.
df = pl.DataFrame({
"lap_time": ["01:14.007", "00:53.040", "01:00.123"]
})
df = df.with_columns(
# pl.col('release_date').str.to_date("%B %d, %Y"), # works
pl.col('lap_time').str.to_time("%M:%S.%3f").cast(pl.Duration), # fails
)
So I used the chrono
format specifier definitions from https://docs.rs/chrono/latest/chrono/format/strftime/index.html which are used as per the polars
docs of strptime
the second conversion (for lap_time
) always fails, no matter whether I use .%f
, .%3f
, %.3f
. Apparently, strptime
doesn't allow creating a pl.Duration
directly, so I tried with pl.Time
but it fails with error:
ComputeError: strict conversion to dates failed, maybe set strict=False
but setting strict=False
yields all null
values for the whole Series.
Am I missing something or this some weird behavior on chrono
's or python-polars
part?
In case you have duration that may exceed 24 hours, you can extract data (minutes, seconds and so on) from string using regex pattern. For example:
df = pl.DataFrame({
"time": ["+01:14.007", "100:20.000", "-05:00.000"]
})
df.with_columns(
pl.col("time").str.extract_all(r"([+-]?\d+)")
# /
# you will get array of length 3
# ["min", "sec", "ms"]
).with_columns(
pl.duration(
minutes=pl.col("time").list.get(0),
seconds=pl.col("time").list.get(1),
milliseconds=pl.col("time").list.get(2)
).alias("time")
)
┌──────────────┐
│ time │
│ --- │
│ duration[ns] │
╞══════════════╡
│ 1m 14s 7ms │
│ 1h 40m 20s │
│ -5m │
└──────────────┘
pl.Time
To convert data to pl.Time
, you need to specify hours as well. When you add 00
hours to your time, code will work:
df = pl.DataFrame({"str_time": ["01:14.007", "01:18.880"]})
df.with_columns(
duration = (pl.lit("00:") + pl.col("str_time"))
.str.to_time("%T%.3f")
.cast(pl.Duration)
)
┌───────────┬──────────────┐
│ str_time ┆ duration │
│ --- ┆ --- │
│ str ┆ duration[μs] │
╞═══════════╪══════════════╡
│ 01:14.007 ┆ 1m 14s 7ms │
│ 01:18.880 ┆ 1m 18s 880ms │
└───────────┴──────────────┘
Create your own parser - strptime
works for DateTime stamps only, not for time deltas. The accepted answer is bad practice as it fails for reasonable inputs like durations of 80 minutes, or negative durations.
You can use pl.Series.str.extract()
to make your own regex parser and extract the values you want before passing them into the Duration
constructor.
As far as I'm aware there is no "duration stamp" parser in Rust. Maybe good idea for a crate if anyone is reading this. Syntax could be similar to strptime
but handle cases like: negative duration, non-wrapping for the most significant "digit"/subunit, in this case where it's a "minute duration stamp" you would wrap seconds at 60 but not minutes. Especially making sure that 61 remains 61.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With