Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Map multiple functions to CSV rows

Tags:

python

I'm trying to do some type conversions on CSV data that is comprised entirely of strings. I was thinking I'd use a dictionary of header names to functions and map those functions over each CSV row. I'm just a little stuck on how to efficiently map multiple functions to a row. I was thinking of enumerating the headers and creating a new dictionary of indices to functions:

header_map = {'Foo':str,
              'Bar':str,
              'FooBar':float}

csv_data = [('Foo', 'Bar', 'FooBar'),
            #lots of data...
           ]

index_map = {}

#enumerate the rows and create a dictionary of index:function
for i, header in enumerate(csv_data[0]):
    index_map[i] = header_map[header]

#retrieve the function for each index and call it on the value
new_csv = [[index_map[i](value) for i, value in enumerate(row)] 
           for row in csv_data[1:]]

I'm just curious if anyone knows of simpler, efficient way to accomplish this type of operation?

like image 320
donopj2 Avatar asked Mar 25 '26 03:03

donopj2


1 Answers

Here is a method, using_converter, which is slightly faster:

import itertools as IT

header_map = {'Foo':str,
              'Bar':str,
              'FooBar':float}

N = 20000
csv_data = [('Foo', 'Bar', 'FooBar')] + [('Foo', 'Bar', 1123.451)]*N

def original(csv_data):
    index_map = {}
    #enumerate the rows and create a dictionary of index:function
    for i, header in enumerate(csv_data[0]):
        index_map[i] = header_map[header]

    #retrieve the appropriate function for each index and call it on the value
    new_csv = [[index_map[i](value) for i, value in enumerate(row)]
               for row in csv_data[1:]]
    return new_csv

def using_converter(csv_data):
    converters = IT.cycle([header_map[header] for header in csv_data[0]])
    conv = converters.next
    new_csv = [[conv()(item) for item in row] for row in csv_data[1:]]
    return new_csv

def using_header_map(csv_data):
    heads = csv_data[0]
    new_csv = [
        tuple(header_map[head](item) for head, item in zip(heads, row))
        for row in csv_data[1:]]
    return new_csv

# print(original(csv_data))
# print(using_converter(csv_data))
# print(using_header_map(csv_data))

Benchmarking with timeit:

The original code:

% python -mtimeit -s'import test' 'test.original(test.csv_data)'
100 loops, best of 3: 17.3 msec per loop

A slightly faster version (using itertools):

% python -mtimeit -s'import test' 'test.using_converter(test.csv_data)'
100 loops, best of 3: 15.5 msec per loop

Lev Levitsky's version:

% python -mtimeit -s'import test' 'test.using_header_map(test.csv_data)'
10 loops, best of 3: 36.2 msec per loop
like image 133
unutbu Avatar answered Mar 27 '26 17:03

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!