Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is "HDFS write pipeline"?

Tags:

hadoop

hdfs

While i was going through hadoop definitive guide, i stuck at below sentence:-

writing the reduce output does consume network bandwidth, but only as much as a normal HDFS write pipeline consumes.

Questions : 1. Can some help me understand above sentence in more detail. 2. And what does "HDFS write pipeline" mean ?

like image 334
rakesh kumar Avatar asked Jan 25 '26 08:01

rakesh kumar


1 Answers

When files are written to HDFS a number of things are going on behind the scenes related to HDFS block consistency and replication. The main IO component of this process is by far replication. There is also the bidirectional communication with the name node registering the block's existence and state.

I think when it says "write pipeline" it just means the process of:

  1. Creating the blocks
  2. Registering with the NN
  3. Performing replication
  4. Doing write flushes to disk
  5. Maintaining block state across the cluster (location, is-locked, last-updated, checksums, ect)
like image 55
Andrew White Avatar answered Jan 29 '26 08:01

Andrew White