Overview #

Examples #

Execution #

Steps #

  1. The master picks idle workers and assigns them one of the Mmap or R reduce jobs
  2. The worker executes the map job and stores the immediate results in memory
  3. Periodically, the workers write the buffered M output into R regions in memory
  4. The location of these R regions on disk are forwarded to the master who will forward these to the reduce workers
  5. Reduce workers load this data through RPC and then sort the keys so they see all the output keys with the same value
  6. The reduce function then operates on the keys and appends the output to a file for that reduce partition
  7. When all map and reduce tasks have been completed, the master wakes up the user program to go back to user code
  8. There are now R output files (one per reduce task). Typically these get passed to another MapReduce call or use them in another distributed application

Fault Tolerance #