How to find first occurrence for each id based on datetime column with pandas?

Question

I have seen a lot of similar questions but didn't quite find an answer to my specific problem. Let's say I have a df:

    sample_id     tested_at   test_value
            1    2020-07-21            5
            1    2020-07-22            4
            1    2020-07-23            6
            2    2020-07-26            6
            2    2020-07-28            5
            3    2020-07-22            4
            3    2020-07-27            4
            3    2020-07-30            6

The df is already sorted for ascending by tested_at column. I now need to add another column first_test which would indicate the first test value for each sample_id in every line, regardless if it is highest or not. The output should be:

    sample_id     tested_at   test_value   first_test
            1    2020-07-21            5            5
            1    2020-07-22            4            5
            1    2020-07-23            6            5
            2    2020-07-26            6            6
            2    2020-07-28            5            6
            3    2020-07-22            4            4
            3    2020-07-27            4            4
            3    2020-07-30            6            4

The df is also quite big, so a faster way would be very much appreciated.

Swier · Accepted Answer

You can use pandas' groupby to group by sample ID, and then use the transform method to get the first value per sample ID. Note that this takes the first value by row number, not the first value by date, so make sure the rows are ordered by date.

df = pd.DataFrame(
    [
        [1, "2020-07-21", 5],
        [1, "2020-07-22", 4],
        [1, "2020-07-23", 6],
        [2, "2020-07-26", 6],
        [2, "2020-07-28", 5],
        [3, "2020-07-22", 4],
        [3, "2020-07-27", 4],
        [3, "2020-07-30", 6],
    ],
    columns=["sample_id", "tested_at", "test_value"],
)

df["first_test"] = df.groupby("sample_id")["test_value"].transform("first")

Which results in:

   sample_id   tested_at  test_value  first_test
0          1  2020-07-21    5           5
1          1  2020-07-22    4           5
2          1  2020-07-23    6           5
3          2  2020-07-26    6           6
4          2  2020-07-28    5           6
5          3  2020-07-22    4           4
6          3  2020-07-27    4           4
7          3  2020-07-30    6           4

How to find first occurrence for each id based on datetime column with pandas?

Tags:

python

pandas

Geormy White

1 Answers

Swier

Recent Activity

Donate For Us

How to find first occurrence for each id based on datetime column with pandas?

Tags:

python

pandas

Geormy White

1 Answers

Swier

Related questions

Recent Activity

Donate For Us