Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop can't finish job because "No space left on device"

I am trying to run a very simple hadoop job. It is a modification of the classic wordCount which, instead of counting words, it counts lines in a file. I want to use this to clean up a bunch of big log files (around 70GB each) that I know have duplications. Each line is a "record", and hence I am interested in just getting each record once.

I know my code works, because it does what it should do when I run it with small normal files. When I run it with big files, Hadoop behaves stringently. First, it starts correctly working on the MAP phase, which normally reaches 100% without problems. When dealing with REDUCE, however, it never reaches more than 50%. It reaches maybe 40%, and then go back to 0% after showing some "No space left on device" exceptions:

FSError: java.io.IOException: No space left on device

Then it tries to do REDUCE again and, when it reaches 40%, it drops to 0% again and so forth. It does this 2 or 3 times before it decides to end without success, of course.

The problem with this exception, though, is that it can't be related to the actual space on the disks. Disk space never gets full. Not the total (global) space on the HDFS, neither the individual disks in each node. I check the fs status with:

$ hadoop dfsadmin -report > report

This report never shows an actual node reaching 100%. In fact, no node come close to that.

I have around 60GB of disk available in each node for me, and I run this in a cluster with 60 data nodes, which gives me a total space of more than 3TB. The file I am trying to process is only 70GB.

Looking out there on the internet, I found that this can be related to Hadoop creating too many files while processing a lot of data. The original wordCount code reduces the data substantially (since words repeat a lot). A file with 70GB can be reduced to an output of just 7MB. However, I am expecting something like 1/3 reduction only, or an output of around 20-30GB.

Unix-type systems come with a limit of 1024 open files per process:

$ ulimit -n
1024

If hadoop is creating more than that, it could be a problem. I asked the system admin to increment that limit to 65K, which is that the limit is now:

$ ulimit -n
65000

Problems continue. Can this be that I need to increment this limit further? Is there something else going on here?

Thanks a lot for your help!

Code here:

package ...;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class LineCountMR {

  public static class MapperClass 
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    private String token = new String();        

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {

        token = value.toString().replace(' ', '_');
        word.set(token);
        context.write(word, one);   
    }
  }

  public static class ReducerClass 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
 }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();;
    if (args.length != 2) {
      System.err.println("Parameters: <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "line count MR");
    job.setJarByClass(LineCountMR.class);
    job.setMapperClass(MapperClass.class);
    job.setCombinerClass(ReducerClass.class);
    job.setReducerClass(ReducerClass.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
like image 693
ekorso Avatar asked Oct 30 '25 08:10

ekorso


1 Answers

I have seen this issue on the cluster while processing 10TB of data. This issue is not related to the space availability on HDFS but the space available on the local file system (df -h) used for storing the intermediate data generated during the map-reduce operation which is stored locally and not in HDFS.

like image 149
Romit S Avatar answered Nov 01 '25 13:11

Romit S



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!