Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting numbers with SI scale factors

Tags:

python

numbers

I have a CSV file that contains a column whose values are given with SI scale factors. I need to do a numeric sort on that column. Specifically, the CSV file contains a list of famous astronomical objects (the Messier objects) and I need to sort them by distance. The kicker is that the distance is given with numbers that use the SI unix prefixes, so a simple sort won't work. Is there a simple way of doing this?

Here is a very abbreviated version of the file:

"Messier Number","Distance"
"M1","6.5 kly"
"M2","33 kly"
"M7","980 ly"
"M16","7 kly"
"M19","29 kly"
"M31","2.5 Mly"
"M49","56 Mly"

Here is what I have so far::

from csv import DictReader

with open('m.csv') as f:
    messier = sorted(DictReader(f), key=lambda e: e['Distance'])

for entry in messier:
    print('{Messier Number:>5s} {Distance}'.format(**entry))

But this does an alphabetic sort rather than a numeric sort:

 M31 2.5 Mly
 M19 29 kly
  M2 33 kly
 M49 56 Mly
  M1 6.5 kly
 M16 7 kly
  M7 980 ly

I could try to split up the distance and interpret the k and M myself, but that seems like the wrong approach. After all, use of metric prefixes is very common. There must be some support for this already. Any pointers would be greatly appreciated.

like image 233
Autumn McClellan Avatar asked Dec 20 '25 16:12

Autumn McClellan


1 Answers

The easiest way to do this is to use QuantiPhy. It is a nice package that reads and writes numbers with SI scale factors and units. QuantiPhy privides Quantity, which subclasses float. It converts your string into something that behaves like a float, which allows you to do a numeric sort. The string may include SI scale factors and units. The scale factor is properly interpreted. In this case the units are not needed and so would be effectively ignored.

Modifying your code to the following should work.

from csv import DictReader
from quantiphy import Quantity

with open('m.csv') as f:
    messier = sorted(DictReader(f), key=lambda e: Quantity(e['Distance']))

for entry in messier:
    print('{Messier Number:>5s} {Distance}'.format(**entry))

With this code the sort comes out right:

  M7 980 ly
  M1 6.5 kly
 M16 7 kly
 M19 29 kly
  M2 33 kly
 M31 2.5 Mly
 M49 56 Mly
like image 183
August West Avatar answered Dec 22 '25 04:12

August West



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!