Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good ways to wrap around the indices for slicing in pandas data frame

I want to slice the data frame by rows or columns using iloc, while wrapping around the out of the bound indices. Here is an example:

import pandas as pd
df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]],columns=['a', 'b', 'c'])
#Slice the rows from 2 to 4, which the dataframe only have 3 rows
print(df.iloc[2:4,:])

Data frame:

    a   b   c  
0   1   2   3  
1   4   5   6  
2   7   8   9  

The output will be:

    a   b   c
2   7   8   9

But I want to wrap around the out of the bound index, which is like:

    a   b   c
2   7   8   9
0   1   2   3

In numpy, it is possible to use numpy.take to wrap around the out of the bound index for slicing. (The numpy take link)

import numpy as np
array = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(array.take(range(2,4) , axis = 0, mode='wrap'))

The output is:

 [[7 8 9]
 [1 2 3]]

A possible solution for wrapping out in pandas is using the numpy.take:

import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3], [4,5,6], [7,8,9]],columns=['a', 'b', 'c'])
# Get the integer indices of the dataframe
row_indices = np.arange(df.shape[0])
# Wrap the slice explicitly
wrap_slice = row_indices.take(range(2,4),axis = 0, mode='wrap')
print(df.iloc[wrap_slice, :])

The output will be the output I want:

   a  b  c
2  7  8  9
0  1  2  3

I looked into pandas.DataFrame.take and there is no "wrap" mode. (The pandas take link). What is a good and easy way to solve this problem? Thank you very much!

like image 727
Echan Avatar asked Dec 21 '25 00:12

Echan


2 Answers

Let's try using np.roll:

df.reindex(np.roll(df.index, shift=-2)[0:2])

Output:

   a  b  c
2  7  8  9
0  1  2  3

And, to make it a little more generic:

startidx = 2
endidx = 4

df.iloc[np.roll(df.index, shift=-1*startidx)[0:endidx-startidx]]
like image 156
Scott Boston Avatar answered Dec 22 '25 16:12

Scott Boston


You could use remainder division

import numpy as np

start_id = 2
end_id = 4
idx = np.arange(start_id, end_id, 1)%len(df)

df.iloc[idx]
#   a  b  c
#2  7  8  9
#0  1  2  3

This method actually allows you to loop around multiple times:

start_id = 2
end_id = 10
idx = np.arange(start_id, end_id, 1)%len(df)

df.iloc[idx]
#   a  b  c
#2  7  8  9
#0  1  2  3
#1  4  5  6
#2  7  8  9
#0  1  2  3
#1  4  5  6
#2  7  8  9
#0  1  2  3
like image 34
ALollz Avatar answered Dec 22 '25 17:12

ALollz