Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering csv file based on user input with python

I have a large data base that looks like this:

id, Start Time, End Time
0, 2017-01-01 00:00:21, 2017-01-01 00:11:41
1, 2017-01-01 00:00:45, 2017-01-01 00:11:46
2, 2017-02-01 00:00:57, 2017-02-01 00:22:08
3, 2017-03-01 00:01:10, 2017-03-01 00:11:42
4, 2017-01-01 00:01:51, 2017-01-01 00:12:57

Using pandas would probably be easier to do this, but I don't have much experience with it. I have researched modules such as arrow and datetime and would like to filter the data based on the user's input. Using that input, the user is returned filtered data. For example:

def get_month('data.csv'):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    date = '1 ' + month + ', 2017'
    with open(city_data, 'r') as fin, open('userdata.csv', 'w') as fout:
         writer = csv.writer(fout, delimiter=' ')
         for row in csv.reader(fin, delimiter=' '):
             if row[0] == arrow.get(date,'D MMMM, YYYY').format('YYYY-MM-DD'):
                 return writer.writerow(row)

Am I approaching this correctly? I think I may be going in the wrong direction in the date = '1 ' + month + ', 2017' part. Is there a way I could filter the data with just an input like January?

like image 655
louielouielouie Avatar asked Jan 31 '26 04:01

louielouielouie


1 Answers

For structured data, pandas provides an efficient solution:

from datetime import datetime
import pandas as pd

# read data from file
df = pd.read_csv('data.csv')

# this creates a dataframe as below:
#    id           Start Time             End Time
# 0   0  2017-01-01 00:00:21  2017-01-01 00:11:41
# 1   1  2017-01-01 00:00:45  2017-01-01 00:11:46
# 2   2  2017-02-01 00:00:57  2017-02-01 00:22:08
# 3   3  2017-03-01 00:01:10  2017-03-01 00:11:42
# 4   4  2017-01-01 00:01:51  2017-01-01 00:12:57

# cast string columns to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])

def get_month(df):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    return df[df['Start Time'].dt.month == datetime.strptime(month, '%B').month]

get_month(df)
like image 193
jpp Avatar answered Feb 02 '26 17:02

jpp