[大数据] Hadoop yarn 调度器与算法

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> Hadoop yarn 调度器与算法 -> 正文阅读

[大数据]Hadoop yarn 调度器与算法

Hadoop 作业调度器主要有三种：FIFO（First In First Out）、容量（Capacity Scheduler）和公平（Fair Scheduler）。Apache Hadoop3.1.3 默认的资源调度器是 Capacity Scheduler。

Hadoop: First In First Out Scheduler

FIFO 调度器（）：单队列，根据提交作业的先后顺序，先来先服务。

优点：简单易懂；

缺点：不支持多队列，生产环境很少使用；

Hadoop: Capacity Scheduler

The?CapacityScheduler?is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster.

Traditionally each organization has it own private set of compute resources that have sufficient capacity to meet the organization’s SLA under peak or near-peak conditions. This generally leads to poor average utilization and overhead of managing multiple independent clusters, one per each organization. Sharing clusters between organizations is a cost-effective manner of running large Hadoop installations since this allows them to reap benefits of economies of scale without creating private clusters. However, organizations are concerned about sharing a cluster because they are worried about others using the resources that are critical for their SLAs.

The?CapacityScheduler?is designed to allow sharing a large cluster while giving each organization capacity guarantees. The central idea is that the available resources in the Hadoop cluster are shared among multiple organizations who collectively fund the cluster based on their computing needs. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations in a cost-effective manner.

Sharing clusters across organizations necessitates strong support for multi-tenancy since each organization must be guaranteed capacity and safe-guards to ensure the shared cluster is impervious to single rogue application or user or sets thereof. The?CapacityScheduler?provides a stringent set of limits to ensure that a single application or user or queue cannot consume disproportionate amount of resources in the cluster. Also, the?CapacityScheduler?provides limits on initialized and pending applications from a single user and queue to ensure fairness and stability of the cluster.

The primary abstraction provided by the?CapacityScheduler?is the concept of?queues. These queues are typically setup by administrators to reflect the economics of the shared cluster.

To provide further control and predictability on sharing of resources, the?CapacityScheduler?supports?hierarchical queues?to ensure resources are shared among the sub-queues of an organization before other queues are allowed to use free resources, thereby providing?affinity?for sharing free resources among applications of a given organization.

CapacityScheduler旨在以一种对操作员友好的方式将Hadoop应用程序作为一个共享的多租户集群运行，同时最大限度地提高集群的吞吐量和利用率。

传统上，每个组织都有自己的私有计算资源集，这些资源有足够的容量在峰值或接近峰值条件下满足组织的SLA。这通常会导致管理多个独立集群（每个组织一个）的平均利用率和开销很低。在组织之间共享集群是运行大型Hadoop安装的一种经济高效的方式，因为这使它们能够在不创建私有集群的情况下获得规模经济效益。但是，组织担心共享集群，因为他们担心其他人使用对其SLA至关重要的资源。

CapacityScheduler旨在允许共享大型集群，同时为每个组织提供容量保证。核心思想是Hadoop集群中的可用资源在多个组织之间共享，这些组织根据其计算需求共同为集群提供资金。还有一个额外的好处，即一个组织可以访问其他组织未使用的任何过剩容量。这为组织提供了经济高效的弹性。

跨组织共享集群需要对多租户的强大支持，因为每个组织都必须保证容量和安全防护，以确保共享集群不受单个恶意应用程序或用户或其集合的影响。CapacityScheduler提供了一组严格的限制，以确保单个应用程序、用户或队列不会消耗集群中过多的资源。此外，CapacityScheduler对来自单个用户和队列的初始化和挂起应用程序提供了限制，以确保集群的公平性和稳定性。

CapacityScheduler提供的主要抽象是队列的概念。这些队列通常由管理员设置，以反映共享集群的经济性。

为了进一步控制和预测资源共享，CapacityScheduler支持分层队列，以确保在允许其他队列使用免费资源之前，在组织的子队列之间共享资源，从而提供在给定组织的应用程序之间共享免费资源的亲和力。

容量调度器资源分配算法

Hadoop: Fair Scheduler

Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Hadoop NextGen is capable of scheduling multiple resource types. By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU, using the notion of Dominant Resource Fairness developed by Ghodsi et al. When there is a single app running, that app uses the entire cluster. When other apps are submitted, resources that free up are assigned to the new apps, so that each app eventually on gets roughly the same amount of resources. Unlike the default Hadoop scheduler, which forms a queue of apps, this lets short apps finish in reasonable time while not starving long-lived apps. It is also a reasonable way to share a cluster between a number of users. Finally, fair sharing can also work with app priorities - the priorities are used as weights to determine the fraction of total resources that each app should get.

The scheduler organizes apps further into “queues”, and shares resources fairly between these queues. By default, all users share a single queue, named “default”. If an app specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to assign queues based on the user name included with the request through configuration. Within each queue, a scheduling policy is used to share resources between the running apps. The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured. Queues can be arranged in a hierarchy to divide resources and configured with weights to share the cluster in specific proportions.

In addition to providing fair sharing, the Fair Scheduler allows assigning guaranteed minimum shares to queues, which is useful for ensuring that certain users, groups or production applications always get sufficient resources. When a queue contains apps, it gets at least its minimum share, but when the queue does not need its full guaranteed share, the excess is split between other running apps. This lets the scheduler guarantee capacity for queues while utilizing resources efficiently when these queues don’t contain applications.

The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler’s queue until some of the user’s earlier apps finish.

公平调度是一种将资源分配给应用程序的方法，这样所有应用程序在一段时间内平均获得相等的资源份额。Hadoop NextGen能够调度多种资源类型。默认情况下，公平调度程序仅基于内存来调度公平性决策。使用Ghodsi等人开发的主导资源公平性概念，可以将其配置为使用内存和CPU进行调度。当有一个应用程序运行时，该应用程序使用整个集群。提交其他应用程序时，释放的资源将分配给新的应用程序，以便每个应用程序最终获得大致相同的资源量。与默认的Hadoop调度程序不同，它形成了一个应用程序队列，允许短应用程序在合理的时间内完成，而不会让长寿命的应用程序陷入饥饿。这也是在多个用户之间共享集群的合理方式。最后，公平共享还可以与应用程序优先级一起使用-优先级用作权重，以确定每个应用程序应获得的总资源的比例。

调度器将应用程序进一步组织到“队列”中，并在这些队列之间公平地共享资源。默认情况下，所有用户共享一个名为“default”的队列。如果某个应用程序在容器资源请求中特别列出了一个队列，则该请求将提交到该队列。还可以通过配置根据请求中包含的用户名分配队列。在每个队列中，调度策略用于在运行的应用程序之间共享资源。默认情况下是基于内存的公平共享，但也可以配置FIFO和具有主要资源公平性的多资源。队列可以按层次结构排列以划分资源，并配置权重以按特定比例共享集群。

除了提供公平共享外，公平调度器还允许为队列分配有保证的最小共享，这对于确保某些用户、组或生产应用程序始终获得足够的资源非常有用。当队列包含应用程序时，它至少会获得其最小共享，但当队列不需要其完全保证的共享时，多余部分会在其他正在运行的应用程序之间分配。这样，当队列不包含应用程序时，调度器可以在有效利用资源的同时保证队列的容量。

默认情况下，Fair Scheduler允许所有应用程序运行，但也可以通过配置文件限制每个用户和每个队列运行的应用程序数量。当用户必须一次提交数百个应用程序时，这非常有用；如果一次运行太多应用程序会导致创建太多中间数据或进行太多上下文切换，则这通常有助于提高性能。限制应用不会导致任何后续提交的应用失败，只会在调度程序队列中等待，直到用户的一些早期应用完成。?