python mapreduce - Skipping the first line of the .csv in mapper

Question

I am trying to do mapreduce in python and my csv file looks like below,

    trip_id taxi_id pickup_time dropoff_time ... total
0   20117   2455.0  2013-05-05   09:45:00         50.44
1   44691   1779.0  2013-06-24   11:30:00         66.78

and my codes are,

import pandas as pd
import numpy as np
from mrjob.job import MRJob

class MRCount(MRJob):

def mapper(self, _, line):
    datarow = line.replace(' ','').replace('N/A','').split(',')
    trip_id = datarow[0]
    total = datarow[14]
    total = np.float(total)
    yield ((trip_id), (total))

Since my code pass all lines to mapper, so it starts with the string line ( index) but I want to do play with total which is float so when I run the file, it get an error

TypeError: float() argument must be a string or a number, not 'generator'

How can I skip the first line in csv file when processing mapper function?

PencilBow · Accepted Answer

Not sure exactly what content 'line' has. A simple answer to your problem is to just try/except the float.

def mapper(self, _, line):
    datarow = line.replace(' ','').replace('N/A','').split(',')
    trip_id = datarow[0]
    total = datarow[14]
    try:
        total = np.float(total)
    except TypeError:
        print("skipping line with value", datarow[14])
    else:
        yield ((trip_id), (total))

python mapreduce - Skipping the first line of the .csv in mapper

Tags:

python

csv

hadoop

mapreduce

mrjob

TTaa

1 Answers

PencilBow

Recent Activity

Donate For Us

python mapreduce - Skipping the first line of the .csv in mapper

Tags:

python

csv

hadoop

mapreduce

mrjob

TTaa

1 Answers

PencilBow

Related questions

Recent Activity

Donate For Us