I have gone thru few hadoop info books and papers.
A Slot is a map/reduce computation unit at a node. it may be map or reduce slot. As far as, i know split is a group of blocks of files in HDFS which have some length and location of nodes where they ares stored. Mapper is class but when the code is instantiated it is called map task. Am i right ? I am not clear of difference and relationship between map tasks, data splits and Mapper.
Regarding scheduling i understand that when a map slot of a node is free a map task is choosen from the non-running map task and launched if the data to be processed by the map task is the node. Can anyone explain it clearly in terms of above concepts: slots, mapper and map task etc.
Thanks, Arun
As far as, I know split is a group of blocks of files in HDFS which have the same length and location of nodes where they are stored.
InputSplit is a unit of data which a particular mapper will process. It needs not be just a group of HDFS blocks. It can be a single line, 100 rows from a DB, a 50MB file etc.
I am not clear about difference and relationship between map tasks, data splits and Mapper.
An InputSplit is processed by a map task and an instance of Mapper is a Map task.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With