While i was going through hadoop definitive guide, i stuck at below sentence:-
writing the reduce output does consume network bandwidth, but only as much as a normal HDFS write pipeline consumes.
Questions : 1. Can some help me understand above sentence in more detail. 2. And what does "HDFS write pipeline" mean ?
When files are written to HDFS a number of things are going on behind the scenes related to HDFS block consistency and replication. The main IO component of this process is by far replication. There is also the bidirectional communication with the name node registering the block's existence and state.
I think when it says "write pipeline" it just means the process of:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With