Code Chronicle: Hadoop MapReduce

MapReduce is a core component of the Apache Hadoop software framework. The MapReduce components as the name implies Maps and Reduces. It distributes work to different nodes within a cluster/map (MAP) and organize the returned result into a result of the query being made (REDUCE).

There are three main components of MapReduce

JobTracker: The node that manages all jobs in a cluster. It is also known as the master node. Jobs are divided into Tasks assigned to individual machines in a cluster.
TaskTracker: A component that takes tracks every task assigned to an individual machine.
JobHistoryServer: This component tracks completed jobs.

MapReduce distributes input data and collate Results. It does so by operating in parallel across massive clusters. Jobs can be split across any number of servers. MapReduce is available in several languages. MapReduce libraries abstract Programmers from under the hood, and create task between having to worry about the intricacies of distributed computing paradigm.

Each node reports back to the master node. The master node can re-assign the task to any other node, if the child node doesn't report back. This makes MapReduce highly fault-tolerant, with the only single point of failure being the master node.

Code Chronicle

Hadoop MapReduce

No comments:

Post a Comment

Running Drupal in Docker

Pages

Search This Blog