ما

Lec 4-part 2.pdf (1)

YARN Overview

  • What is YARN?

    • Stands for Yet Another Resource Negotiator.

YARN Architectural Overview

  • JobTracker Functions:

    • Splits two major functions:

      • Global Resource Manager: Manages cluster resources.

      • Application Master: Handles job scheduling and monitoring (one for each application).

      • The Application Master negotiates resource containers from the Scheduler and monitors their status.

      • Runs as a normal container.

  • NodeManager (NM):

    • New per-node slave.

    • Responsible for launching application containers, monitoring resource usage (CPU, memory, disk, network), and reporting to the Resource Manager.

YARN Flow

  • Components in YARN:

    • Resource Manager:

      • Cluster-level Resource Manager.

      • Long-life, high-quality hardware configurations.

    • Node Manager:

      • One per Data Node.

      • Monitors resources on the Data Node.

    • Application Master:

      • Short-life, one per Data Node.

      • Manages tasks/scheduling.

YARN and MapReduce Interaction

  • In the MapReduce paradigm, an application comprises Map and Reduce tasks.

  • Map and Reduce tasks align well with YARN tasks.

  • On YARN side:

    • ResourceManager, NodeManager, and ApplicationMaster manage cluster resources.

  • In a MapReduce application:

    • Multiple map/reduce tasks are executed.

    • Each task runs in a container on a worker host in the cluster.

Job Scheduling Mechanism

  • Schedulers in MapReduce:

    • FIFO (First In First Out) Scheduler

    • Capacity Scheduler

    • Fair Scheduler

    • Default in MapReduce 1 & MapReduce 2.

FIFO Scheduler

  • Jobs are run based on submission order.

  • Example Queue:

    • Job 1, Job 2, ..., Job 6.

Capacity Scheduler

  • Allocates resources to fully utilize the cluster.

  • Multiple job queues allow sharing of large Hadoop clusters.

  • Each queue has assigned slots/resources for job operation.

  • Tasks can access free slots in other queues if available.

Fair Scheduler

  • Goal:

    • Ensure each job receives a fair share of resources over time.

  • Priorities allow high-priority jobs to be scheduled first when capacity is available.

Assignment Scenario

  • Server with 8 CPUs:

    • Projects: A (1 share), B (2 shares), C (3 shares).

  • Project A runs on Processor Set 1, B on 1 and 2, C on 1, 2, and 3.

  • Tasks will be utilized based on FIFO, Capacity, and Fair Scheduler policies.g