I am trying to do mapreduce in python and my csv file looks like below,
trip_id taxi_id pickup_time dropoff_time ... total
0 20117 2455.0 2013-05-05 09:45:00 50.44
1 44691 1779.0 2013-06-24 11:30:00 66.78
and my codes are,
import pandas as pd
import numpy as np
from mrjob.job import MRJob
class MRCount(MRJob):
def mapper(self, _, line):
datarow = line.replace(' ','').replace('N/A','').split(',')
trip_id = datarow[0]
total = datarow[14]
total = np.float(total)
yield ((trip_id), (total))
Since my code pass all lines to mapper, so it starts with the string line ( index) but I want to do play with total which is float so when I run the file, it get an error
TypeError: float() argument must be a string or a number, not 'generator'
How can I skip the first line in csv file when processing mapper function?
Not sure exactly what content 'line' has. A simple answer to your problem is to just try/except the float.
def mapper(self, _, line):
datarow = line.replace(' ','').replace('N/A','').split(',')
trip_id = datarow[0]
total = datarow[14]
try:
total = np.float(total)
except TypeError:
print("skipping line with value", datarow[14])
else:
yield ((trip_id), (total))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With