I'm Downloading stock prices from Yahoo for the S&P500, which has volume too big for a 32-bit integer.
def yahoo_prices(ticker, start_date=None, end_date=None, data='d'):
csv = yahoo_historical_data(ticker, start_date, end_date, data)
d = [('date', np.datetime64),
('open', np.float64),
('high', np.float64),
('low', np.float64),
('close', np.float64),
('volume', np.int64),
('adj_close', np.float64)]
return np.recfromcsv(csv, dtype=d)
Here's the error:
>>> sp500 = yahoo_prices('^GSPC')
Traceback (most recent call last):
File "<stdin>", line 108, in <module>
File "<stdin>", line 74, in yahoo_prices
File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1812, in recfromcsv
output = genfromtxt(fname, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1646, in genfromtxt
output = np.array(data, dtype=ddtype)
OverflowError: long int too large to convert to int
Why would I still be getting this error if I declared the dtype to use int64? Is this an indication that the io function isn't really using my dtype sequence d?
===Edit ... example csv added===
Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88
I'm not sure, but I think you found a bug in numpy. I filed it here.
As I said there, if you open npyio.py and edit this line within recfromcsv:
kwargs.update(dtype=kwargs.get('update', None),
to this:
kwargs.update(dtype=kwargs.get('dtype', None),
Then it works for me with no problem for the long integer (I didn't check the datetime correctness as Joe wrote in his answer). You may notice that your dates weren't being converted either. Here is the specific code that works. The contents of "test.csv" are copy pasted from your example csv data.
import numpy as np
d = [('date', np.datetime64),
('open', np.float64),
('high', np.float64),
('low', np.float64),
('close', np.float64),
('volume', np.int64),
('adj_close', np.float64)]
a = np.recfromcsv("test.csv", dtype=d)
print(a)
[ (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1329.19, 1343.32, 1329.19, 1342.84, 4401570000, 1342.84)
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1314.88, 1333.68, 1314.14, 1329.1, 3687720000, 1329.1)
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1324.02, 1327.28, 1310.51, 1314.88, 3506510000, 1314.88)]
I've also "fixed" the datetime issue by using a native python object in the datetime field. I don't know if that will work for you.
import datetime
import numpy as np
d = [('date', datetime.datetime),
('open', np.float64),
('high', np.float64),
('low', np.float64),
('close', np.float64),
('volume', np.int64),
('adj_close', np.float64)]
#a = np.recfromcsv("test.csv", dtype=d)
kwargs = {"dtype": d}
case_sensitive = kwargs.get('case_sensitive', "lower") or "lower"
names = kwargs.get('names', True)
kwargs.update(
delimiter=kwargs.get('delimiter', ",") or ",",
names=names,
case_sensitive=case_sensitive)
output = np.genfromtxt("test.csv", **kwargs)
output = output.view(np.recarray)
print(output)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With