What is YARN?
Stands for Yet Another Resource Negotiator.
JobTracker Functions:
Splits two major functions:
Global Resource Manager: Manages cluster resources.
Application Master: Handles job scheduling and monitoring (one for each application).
The Application Master negotiates resource containers from the Scheduler and monitors their status.
Runs as a normal container.
NodeManager (NM):
New per-node slave.
Responsible for launching application containers, monitoring resource usage (CPU, memory, disk, network), and reporting to the Resource Manager.
Components in YARN:
Resource Manager:
Cluster-level Resource Manager.
Long-life, high-quality hardware configurations.
Node Manager:
One per Data Node.
Monitors resources on the Data Node.
Application Master:
Short-life, one per Data Node.
Manages tasks/scheduling.
In the MapReduce paradigm, an application comprises Map and Reduce tasks.
Map and Reduce tasks align well with YARN tasks.
On YARN side:
ResourceManager, NodeManager, and ApplicationMaster manage cluster resources.
In a MapReduce application:
Multiple map/reduce tasks are executed.
Each task runs in a container on a worker host in the cluster.
Schedulers in MapReduce:
FIFO (First In First Out) Scheduler
Capacity Scheduler
Fair Scheduler
Default in MapReduce 1 & MapReduce 2.
Jobs are run based on submission order.
Example Queue:
Job 1, Job 2, ..., Job 6.
Allocates resources to fully utilize the cluster.
Multiple job queues allow sharing of large Hadoop clusters.
Each queue has assigned slots/resources for job operation.
Tasks can access free slots in other queues if available.
Goal:
Ensure each job receives a fair share of resources over time.
Priorities allow high-priority jobs to be scheduled first when capacity is available.
Server with 8 CPUs:
Projects: A (1 share), B (2 shares), C (3 shares).
Project A runs on Processor Set 1, B on 1 and 2, C on 1, 2, and 3.
Tasks will be utilized based on FIFO, Capacity, and Fair Scheduler policies.g