What is MapReduce?
This is an approach, algorithm or pattern of the parallel processing of large volumes of unprocessed data, for example, the results of crawlers or the logs of web queries. According to statistics 80% tasks could be mapped on is mapped on MapReduce, it just drives NoSQL. There are different implementations of MapReduce. Well known and patented implementation of this algorithm and the approach of Google, for example: MySpace Qizmt - MySpace's Open Source Mapreduce Framework, is also used in Hadoop, MongoDb and there are many different examples that we can give. More details can be found in the article MapReduce: Simplified Data Processing on Large Clusters
The algorithm receives at the input 3 arguments: the source collection, Map function, and Reduce function, and it returns a new collection of data.
Collection MapReduce (Collection source, Function map, Function reduce)
The algorithm is composed by few steps; the first one consists to execute the Map function to each item within the source collection. The Map will return zero or may instances of Key/Value objects