I''m learning to use pandas, to use it for some data analysis. The data is supplied as a csv file, with several columns, of which i only need to use 4 (date, time, o, c). I'll like to create a new DataFrame, which uses as index a DateTime64 number, this number is creating by merging the first two columns, applying pd.to_datetime on the merged string.
My loader code works fine:
st = pd.read_csv("C:/Data/stockname.txt", names=["date","time","o","h","l","c","vol"])
The challenge is converting the loaded DataFrame into a new one, with the right format. The below works but is very slow. Moreover, it just makes one column with the new datetime64 format, and doesnt make it the index.
My code
st_new = pd.concat([pd.to_datetime(st.date + " " + st.time), (st.o + st.c) / 2, st.vol], 
     axis = 1, ignore_index=True)
What would be a more pythonic way to merge two columns, and apply a function into the result? How to make the new column to be the index of the DataFrame?
You can do everythin in the read_csv function:
pd.read_csv('test.csv',
            parse_dates={'timestamp': ['date','time']},
            index_col='timestamp',
            usecols=['date', 'time', 'o', 'c'])
parse_dates tells the read_csv function to combine the date and time column into one timestamp column and parse it as a timestamp. (pandas is smart enough to know how to parse a date in various formats)
index_col sets the timestamp column to be the index.
usecols tells the read_csv function to select only the subset of the columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With