I am trying to plot something which is in this csv format: timestamp, value. But the values are not real numbers but rather abbreviations of large values (k = 1000, M = 1000000 etc).
2012-02-24 09:07:01, 8.1M
2012-02-24 09:07:02, 64.8M
2012-02-24 09:07:03, 84.8M
2012-02-24 09:07:04, 84.8M
2012-02-24 09:07:05, 84.8M
2012-02-24 09:07:07, 84.8M
2012-02-24 09:07:08, 84.8M
2012-02-24 09:07:09, 84.8M
2012-02-24 09:07:10, 84.8M
I usually use numpy record array to store the csv using matplotlib.mlab.csv2rec(infile)
. But works only if the values are not in abbreviated form. Is there an easy way to do this without actually my program reading each value, looking for 'M' to convert 84.8M to 84800000?
Another possibility is the following conversion function:
conv = dict(zip('kMGT', (3, 6, 9, 12)))
def parse_number(value):
if value[-1] in conv:
value = '{}e{}'.format(value[:-1], conv[value[-1]])
return float(value)
Example:
>>> parse_number('1337')
1337.0
>>> parse_number('8.1k')
8100.0
>>> parse_number('8.1M')
8100000.0
>>> parse_number('64.367G')
64367000000.0
You could use the function by Niklas B. in the convertd argument of csv2rec:
>>> data = mlab.csv2rec(infile, names=['datetime', 'values'],
... convertd={'values': parse_number})
>>> data
rec.array([(datetime.datetime(2012, 2, 24, 9, 7, 1), 8100000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 2), 64800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 3), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 4), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 5), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 7), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 8), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 9), 84800000.0),
(datetime.datetime(2012, 2, 24, 9, 7, 10), 84800000.0)],
dtype=[('datetime', '|O8'), ('values', '<f8')])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With