Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grabbing selection between specific dates in a DataFrame

so I have a large pandas DataFrame that contains about two months of information with a line of info per second. Way too much information to deal with at once, so I want to grab specific timeframes. The following code will grab everything before February 5th 2012:

sunflower[sunflower['time'] < '2012-02-05']

I want to do the equivalent of this:

sunflower['2012-02-01' < sunflower['time'] < '2012-02-05']

but that is not allowed. Now I could do this with these two lines:

step1 = sunflower[sunflower['time'] < '2012-02-05']
data = step1[step1['time'] > '2012-02-01']

but I have to do this with 20 different DataFrames and a multitude of times and being able to do this easily would be nice. I know pandas is capable of this because if my dates were the index rather than a column, it's easy to do, but they can't be the index because dates are repeated and therefore you receive this error:

Exception: Reindexing only valid with uniquely valued Index objects

So how would I go about doing this?

like image 983
Ryan Saxe Avatar asked Oct 22 '25 18:10

Ryan Saxe


2 Answers

You could define a mask separately:

df = DataFrame('a': np.random.randn(100), 'b':np.random.randn(100)})
mask = (df.b > -.5) & (df.b < .5)
df_masked = df[mask]

Or in one line:

df_masked = df[(df.b > -.5) & (df.b < .5)]
like image 115
qua Avatar answered Oct 24 '25 08:10

qua


You can use query for a more concise option:

df.query("'2012-02-01' < time < '2012-02-05'")
like image 35
rachwa Avatar answered Oct 24 '25 08:10

rachwa