Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to quickly fillna with a sequence

Tags:

python

pandas

I have a question about how to quickly fillna with a sequence in Python(pandas).I have a dataset like following(the true dataset is longer),

Time Number
t0 NA
t1 NA
t2 NA
t3 0
t4 NA
t5 NA
t6 NA
t7 NA
t8 0
t9 NA

My requirement is to add numbers to N lines before and after non-blank lines, and the sequence range is range(-N,N+1).The interval between any two non-empty rows in the dataset is greater than C(constant), our N will be less than C, so there is no need to consider the coverage problem for the time being. Assuming N=2, the result I need is as follows :

Time Number
t0 NA
t1 -2
t2 -1
t3 0
t4 1
t5 2
t6 -2
t7 -1
t8 0
t9 1

At present, the only way I can think of is to use a loop, but the efficiency is low. Does pandas have such a method to do it quickly?

like image 362
Lucio Avatar asked Oct 19 '25 20:10

Lucio


1 Answers

There are still some unknowns in your question, like what happens if the intervals overlap. Here I will consider that a further interval overwrites the previous one (you can do the other way around with a change of code, see second part).

Using rolling, groupby.cumcount, and a mask:

s = df['Number'].notna().shift(-N, fill_value=False)
m = s.rolling(2*N+1, min_periods=1).max().astype(bool)

df['Number2'] = df.groupby(s.cumsum()).cumcount().sub(N).where(m)

NB. I used a slightly different example to show the overlap.

output:

   Time  Number  Number2
 0   t0     NaN      NaN
 1   t1     NaN     -2.0
 2   t2     NaN     -1.0
 3   t3     0.0      0.0
 4   t4     NaN      1.0
 5   t5     NaN     -2.0  # here we have an overlap, use latter value
 6   t6     NaN     -1.0
 7   t7     0.0      0.0
 8   t8     NaN      1.0
 9   t9     NaN      2.0
10  t10     NaN      NaN

priority on first group

s = df['Number'].notna().shift(N, fill_value=False)[::-1]
m = s.rolling(2*N+1, min_periods=1).max().astype(bool)

df['Number3'] = df.groupby(s.cumsum()).cumcount(ascending=False).rsub(N).where(m)

output:

   Time  Number  Number2  Number3
0    t0     NaN      NaN      NaN
1    t1     NaN     -2.0     -2.0
2    t2     NaN     -1.0     -1.0
3    t3     0.0      0.0      0.0
4    t4     NaN      1.0      1.0
5    t5     NaN     -2.0      2.0  # difference in behavior
6    t6     NaN     -1.0     -1.0
7    t7     0.0      0.0      0.0
8    t8     NaN      1.0      1.0
9    t9     NaN      2.0      2.0
10  t10     NaN      NaN      NaN
like image 59
mozway Avatar answered Oct 21 '25 10:10

mozway