Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble with conversion of duration time strings

I have some duration type data (lap times) as pl.String that fails to convert using strptime, whereas regular datetimes work as expected.

Minutes (before :) and Seconds (before .) are always padded to two digits, Milliseconds are always 3 digits.

Lap times are always < 2 min.

df = pl.DataFrame({
    "lap_time": ["01:14.007", "00:53.040", "01:00.123"]
})

df = df.with_columns(
    # pl.col('release_date').str.to_date("%B %d, %Y"), # works
    pl.col('lap_time').str.to_time("%M:%S.%3f").cast(pl.Duration), # fails
)

So I used the chrono format specifier definitions from https://docs.rs/chrono/latest/chrono/format/strftime/index.html which are used as per the polars docs of strptime

the second conversion (for lap_time) always fails, no matter whether I use .%f, .%3f, %.3f. Apparently, strptime doesn't allow creating a pl.Duration directly, so I tried with pl.Time but it fails with error:

ComputeError: strict conversion to dates failed, maybe set strict=False

but setting strict=False yields all null values for the whole Series.

Am I missing something or this some weird behavior on chrono's or python-polars part?

like image 205
Dorian Avatar asked Sep 19 '25 08:09

Dorian


2 Answers

General case

In case you have duration that may exceed 24 hours, you can extract data (minutes, seconds and so on) from string using regex pattern. For example:

df = pl.DataFrame({
    "time": ["+01:14.007", "100:20.000", "-05:00.000"]
})

df.with_columns(
    pl.col("time").str.extract_all(r"([+-]?\d+)")
    #                                /
    #                 you will get array of length 3
    #                 ["min", "sec", "ms"]
).with_columns(
    pl.duration(
        minutes=pl.col("time").list.get(0),
        seconds=pl.col("time").list.get(1),
        milliseconds=pl.col("time").list.get(2)
    ).alias("time")
)
┌──────────────┐
│ time         │
│ ---          │
│ duration[ns] │
╞══════════════╡
│ 1m 14s 7ms   │
│ 1h 40m 20s   │
│ -5m          │
└──────────────┘

About pl.Time

To convert data to pl.Time, you need to specify hours as well. When you add 00 hours to your time, code will work:

df = pl.DataFrame({"str_time": ["01:14.007", "01:18.880"]})

df.with_columns(
    duration = (pl.lit("00:") + pl.col("str_time"))
        .str.to_time("%T%.3f")
        .cast(pl.Duration)
)
┌───────────┬──────────────┐
│ str_time  ┆ duration     │
│ ---       ┆ ---          │
│ str       ┆ duration[μs] │
╞═══════════╪══════════════╡
│ 01:14.007 ┆ 1m 14s 7ms   │
│ 01:18.880 ┆ 1m 18s 880ms │
└───────────┴──────────────┘
like image 162
glebcom Avatar answered Sep 21 '25 20:09

glebcom


Create your own parser - strptime works for DateTime stamps only, not for time deltas. The accepted answer is bad practice as it fails for reasonable inputs like durations of 80 minutes, or negative durations.

You can use pl.Series.str.extract() to make your own regex parser and extract the values you want before passing them into the Duration constructor.

As far as I'm aware there is no "duration stamp" parser in Rust. Maybe good idea for a crate if anyone is reading this. Syntax could be similar to strptime but handle cases like: negative duration, non-wrapping for the most significant "digit"/subunit, in this case where it's a "minute duration stamp" you would wrap seconds at 60 but not minutes. Especially making sure that 61 remains 61.

like image 23
Cornelius Roemer Avatar answered Sep 21 '25 22:09

Cornelius Roemer