Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python mapreduce - Skipping the first line of the .csv in mapper

I am trying to do mapreduce in python and my csv file looks like below,

    trip_id taxi_id pickup_time dropoff_time ... total
0   20117   2455.0  2013-05-05   09:45:00         50.44
1   44691   1779.0  2013-06-24   11:30:00         66.78

and my codes are,

import pandas as pd
import numpy as np
from mrjob.job import MRJob

class MRCount(MRJob):

def mapper(self, _, line):
    datarow = line.replace(' ','').replace('N/A','').split(',')
    trip_id = datarow[0]
    total = datarow[14]
    total = np.float(total)
    yield ((trip_id), (total))

Since my code pass all lines to mapper, so it starts with the string line ( index) but I want to do play with total which is float so when I run the file, it get an error

TypeError: float() argument must be a string or a number, not 'generator'

How can I skip the first line in csv file when processing mapper function?

like image 459
TTaa Avatar asked Dec 14 '25 04:12

TTaa


1 Answers

Not sure exactly what content 'line' has. A simple answer to your problem is to just try/except the float.

def mapper(self, _, line):
    datarow = line.replace(' ','').replace('N/A','').split(',')
    trip_id = datarow[0]
    total = datarow[14]
    try:
        total = np.float(total)
    except TypeError:
        print("skipping line with value", datarow[14])
    else:
        yield ((trip_id), (total))
like image 65
PencilBow Avatar answered Dec 16 '25 18:12

PencilBow



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!