Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add random number of microseconds to each index in a dataframe's datetime index

I have a dataframe that uses a DateTime index. Due to the nature of the data, there are duplicate rows in the raw data that produce duplicate indices. I would like to add a random number of microseconds to each index to make them unique.

So, my index is currently:

2021-06-01 08:58:47+00:00

and I want it to be:

2021-06-01 08:58:47.011356+00:00

I have a column named TimeUTC which I used to create the index. I tried adding a random number of microseconds to it:

    df.index = df.index + datetime.timedelta(microseconds= random.randrange(0,1000000,1))

but that added the same number of microseconds to all rows. I had the same result when trying to use apply + lambda.

  1. How can I add a random number of microseconds to each row's TimeUTC?
  2. Can I do this directly to the index instead of updating TimeUTC and then reindexing?

Thank you.

like image 371
DKovar Avatar asked Oct 26 '25 08:10

DKovar


1 Answers

you need to use to_timedelta with unit='us' for microseconds, and an array of random number of the same length than the dataframe

# dummy data
df = pd.DataFrame({
    'a':range(10), 
    'TimeUTC':pd.to_datetime(['2021-10-01 00:00:00']*3+['2021-10-01 00:00:01']*5
                             +['2021-10-01 00:00:02']*2, utc=True)}
)

# set seed for reproducibility
np.random.seed(10)

# set the index from TimeUTC and add microseconds with to_timedelta
df.index = df['TimeUTC']+pd.to_timedelta(np.random.randint(1,1000000,len(df)),unit='us')

print(df)
                                  a                   TimeUTC
2021-10-01 00:00:00.345354+00:00  0 2021-10-01 00:00:00+00:00
2021-10-01 00:00:00.760958+00:00  1 2021-10-01 00:00:00+00:00
2021-10-01 00:00:00.881168+00:00  2 2021-10-01 00:00:00+00:00
2021-10-01 00:00:01.443713+00:00  3 2021-10-01 00:00:01+00:00
2021-10-01 00:00:01.617842+00:00  4 2021-10-01 00:00:01+00:00
2021-10-01 00:00:01.105596+00:00  5 2021-10-01 00:00:01+00:00 
2021-10-01 00:00:01.533661+00:00  6 2021-10-01 00:00:01+00:00
2021-10-01 00:00:01.927706+00:00  7 2021-10-01 00:00:01+00:00
2021-10-01 00:00:02.299742+00:00  8 2021-10-01 00:00:02+00:00
2021-10-01 00:00:02.804337+00:00  9 2021-10-01 00:00:02+00:00

Note that adding random number may not conserve the time order of your dataframe, see in this case for the value a=5, the datetime index is lower than the previous two rows

like image 178
Ben.T Avatar answered Oct 27 '25 22:10

Ben.T



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!