Generally, MapReduce can be split into 33 stages, with an optional final stage.

graph LR;
E((Data))--> A[Map] --> B[Shuffle] --> C[Reduce] --> D(["(Summation)"]) --> F((Output))

Given a large amount of data DD and an operation to be performed on data f()f(\cdot), MapReduce tries to exploit the full functionality of distributed servers.

The Map Stage

The system splits the whole dataset DD into several parts. The goal of this stage is to assign each worker with a subset of the whole dataset to speed up execution.

Suppose we have nn workers. Then the whole dataset is split into nn subsets did_i. And each worker wiw_i performs operation on the assigned subset to generate intermediate result, say (ki,vi)(k_i, v_i).

Halt Prevention

In order to prevent the result loss due to worker shutdown, all intermediate results will be written to workers’ disks, so that workers can resume and continue sending.

The important thing is that the generated intermediate result is key-value form. Because later we will be using the keys to send intermediate results to “reducers” to aggregate these intermediate results. And the keys are crucial for load balancing.

The Shuffle Stage

Shuffle procedure is not like workers first send results to an independent server and then distributed to reducer. Instead, it’s like an stub that is directly implemented on worker/reducer side.

In short, after workers have computed the results (k,v)(k,v), they will directly send this (k,v)(k,v) to the corresponding reducer based on a deterministic algorithm such that even on different workers, the same key will be mapped to the same reducer.

The Reduce Stage

So after the shuffle process, for each reducer, it will receive results in the form of (k,v[])(k, v[]), a single key and a list of values. Note that, a single reducer may process several keys and several lists of values.

So the reduce procedure is straight-forward that it accepts a list of values as input, and produces one or more final result.

Halt Prevention

Same as map workers, reducers also write final results to their own disks.

The Summation Stage

This stage is optional, because only some operations may require a global result. In order to get this global result, we simply need to iterate over the distributed reducers and aggregate their computed final results.