I have a cluster setup which has 8 nodes and I am parsing a 20GB text file with mapreduce. Normally, my purpose is get every line by mapper and send with a key which is one of the columns on the row of input file. When reducer gets it, it will be written to different directory based on the key value. If I give an example: input file:
test;1234;A;24;49;100
test2;222;B;29;22;22
test2;0099;C;29;22;22
So these rows will be written like this:
/output/A-r-0001
/output/B-r-0001
/output/C-r-0001
I am using MultipleOutputs object in reducer and if I use a small file everything is ok. But when I use 20GB file, 152 mappers and 8 reducers are initializing. Everything finishes really fast on mapper side, but one reducer keeps continue. 7 of the reducers finishes max 18 minutes, but the last one takes 3 hours. First, I suspect the input of that reducer is bigger than the rest of them, but it is not the case. One reducer has three times more input than the slow one and that finishes in 17 minutes.
I've also tried to increase the number of reducer to 14, but this was resulted with 2 more slow reduce tasks.
I've checked lots of documentation and could no figure why this is happening. Could you guys help me with it?
EDITED
The problem was due to some corrupt data in my dataset. I've put some strict checks on the input data at mapper side and it is working fine now.
Thanks guys.
I've seen that happen often when dealing with skewed data, so my best guess is that your dataset is skewed, which means your Mapper will emit lots of records with the same key that will go to the same reducer which will be overloaded because it has a lot of values to go through.
There is no easy solution for this and it really depends on the business logic of your job, you could maybe have a check in your Reducer and say if you have more than N values ignore all values after N.
I've also found some doc about SkewReduce which is supposed to make it easier to manage skewed data in a Hadoop environment as described in their paper, but I haven't tried it myself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With