Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Formatting "Kilo", "Mega", "Gig" data in numpy record array

I am trying to plot something which is in this csv format: timestamp, value. But the values are not real numbers but rather abbreviations of large values (k = 1000, M = 1000000 etc).

2012-02-24 09:07:01, 8.1M
2012-02-24 09:07:02, 64.8M
2012-02-24 09:07:03, 84.8M
2012-02-24 09:07:04, 84.8M
2012-02-24 09:07:05, 84.8M
2012-02-24 09:07:07, 84.8M
2012-02-24 09:07:08, 84.8M
2012-02-24 09:07:09, 84.8M
2012-02-24 09:07:10, 84.8M

I usually use numpy record array to store the csv using matplotlib.mlab.csv2rec(infile). But works only if the values are not in abbreviated form. Is there an easy way to do this without actually my program reading each value, looking for 'M' to convert 84.8M to 84800000?

like image 901
Ritesh Avatar asked Sep 14 '25 17:09

Ritesh


2 Answers

Another possibility is the following conversion function:

conv = dict(zip('kMGT', (3, 6, 9, 12)))
def parse_number(value):
  if value[-1] in conv:
    value = '{}e{}'.format(value[:-1], conv[value[-1]])
  return float(value)

Example:

>>> parse_number('1337')
1337.0
>>> parse_number('8.1k')
8100.0
>>> parse_number('8.1M')
8100000.0
>>> parse_number('64.367G')
64367000000.0
like image 126
Niklas B. Avatar answered Sep 16 '25 08:09

Niklas B.


You could use the function by Niklas B. in the convertd argument of csv2rec:

>>> data = mlab.csv2rec(infile, names=['datetime', 'values'],
...                     convertd={'values': parse_number})
>>> data
rec.array([(datetime.datetime(2012, 2, 24, 9, 7, 1), 8100000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 2), 64800000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 3), 84800000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 4), 84800000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 5), 84800000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 7), 84800000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 8), 84800000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 9), 84800000.0),
   (datetime.datetime(2012, 2, 24, 9, 7, 10), 84800000.0)], 
  dtype=[('datetime', '|O8'), ('values', '<f8')])
like image 26
bmu Avatar answered Sep 16 '25 06:09

bmu