开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> MapReduce 计数器 -> 正文阅读

[大数据]MapReduce 计数器

1. MapReduce 计数器

1.1 计数器是什么

计数器一般用来记录 job 执行进度和状态信息。

如果自己手动实现计数，需要考虑将多个线程的计算结果合并，编码过于麻烦，通常使用 MapReduce 的计数器。

实际应用中可以用来统计任务的某个环节的执行次数或者数据量，并作为优化前后的参数依据。

1.2 计数器分类

🌟 内置计数器：
内置计数器用来描述该作业的各项指标。

🌟 自定义计数器：

用户可以通过自己定义计数器，来实现特定的需求，可以通过枚举的方式定义计数器，也可以通过 context.getCounter 的方式自定义计数器。

2. MapReduce 内置计数器

内置计数器中分为若干个组，组内包含若干个统计项。

组别	对应类	描述
MapReduce 任务计数器	`mapreduce.TaskCounter`	统计任务的具体信息
文件系统计数器	`mapreduce.FileSystemCounter`	统计任务的读取或写入
FileInputFormat 计数器	`mapreduce.lib.input.FileInputFormatCounter`	统计读取的字节数
FileOutputFormat 计数器	`mapreduce.lib.input.FileOutputFormatCounter`	统计写出的字节数
作业计数器	`mapreduce.JobCounter`	统计任务的作业

2.1 MapReduce 任务计数器

名称	描述
map input records	map 输入的记录数
map input bytes	map 输入的字节数
map skipped records	map 跳过的记录数
map output records	map 输出的记录数
map output bytes	map 输出的字节数
split raw bytes	分片的原始字节数
map output materialized bytes	map 输出的物化字节数
combine input records	combine 输入的记录数
combine output records	combine 输出的记录数
reduce input groups	reduce 输入的组
reduce input records	reduce 输入的记录数
reduce output records	reduce 输出的记录数
reduce skipped groups	reduce 跳过的组数
reduce skipped records	reduce 跳过的字节数
reduce shuffle bytes	reduce 结果 shuffle 的字节数
spilled records	溢出的记录数
cpu milliseconds	CPU 毫秒
physical memory bytes	物理内存字节数
virtual memory bytes	虚拟内存字节数
committed heap bytes	有效的堆字节数
gc time millis	gc 运行时间
shuffled maps	由 shuffle 传输的 map 输出数
failed shuffle	失败的 shuffle 数
merged map outputs	被合并的 map 输出数

2.2 文件系统计数器

名称	描述
bytes read	文件系统的读字节数
bytes waitten	文件系统的写字节数

2.3 FileInputFormat 计数器

名称	描述
bytes read	读的字节数

2.4 FileOutputFormat 计数器

名称	描述
bytes waitten	写的字节数

2.5 作业计数器

名称	描述
total launched maps	启用的 map 任务数
total launched reduces	启用的 reduce任务数
total launched ubertasks	启用的 uber 任务数
num uber submaps	uber 中的 map 任务数
num uber subreduces	uber 中的reduce 任务数
num failed maps	失败的 map任务数
num failed reduces	失败的 reduce任务数
num failed ubertasks	失败的 uber 任务数
data local maps	数据本地化的 map 任务数
rack local maps	机架本地化的 map 任务数
other local maps	其他本地化的 map 任务数
slots millis maps	map 任务的总运行时间
slots millis reduces	reduce 任务的总运行时间
fallow slots millis maps	在保留槽之后，map 任务等待的总时间
fallow slots millis reduces	在保留槽之后，reduce 任务等待的总时间

3. 自定义计数器

MapReduce 提供了两种方式直接创建 MapReduce 程序全局计数器，并且使用 Counter.incriment() 进行累加操作。

3.1 窥见源码

  /**
   * 获取给定counterName的计数器
   * @param counterName 计数器名称
   * @return 给定counterName的计数器
   */
  public Counter getCounter(Enum<?> counterName);

  /**
   * 获取给定groupName和counterName的计数器。
   */
  public Counter getCounter(String groupName, String counterName);

3.2 枚举声明计数器

通过 getCounter 传入枚举类型，可以实现计数器的功能。

实现：统计IP数量、统计192开头的IP数量！

Counter counter1 = context.getCounter(IpCounterEnum.IP_Quantity_Statistics);
counter1.increment(1);
Counter counter2 = context.getCounter(IpCounterEnum.IP_Start_With_192);
if (key.toString().startsWith("192")){
    counter2.increment(1);
}

📤 查看输出：

CustomCounter.IpCounterEnum
		IP_Quantity_Statistics=1137
		IP_Start_With_192=32

3.3 自定义声明计数器

通过 getCounter 传入自定义组名及项名，可以实现计数器的功能。

实现：统计IP数量、统计192开头的IP数量！

Counter counter1 = context.getCounter("数量统计", "访问量");
counter1.increment(1);
Counter counter2 = context.getCounter("数量统计", "以192开头的IP");
if (key.toString().startsWith("192")) {
    counter2.increment(1);
}

📤 查看输出：

数量统计
		以192开头的IP=32
		访问量=1137

4. 写在最后

建议使用传入枚举的方式实现信息的统计！

???END???

大数据最新文章

实现Kafka至少消费一次

亚马逊云科技：还在苦于ETL？Zero ETL的时代

初探MapReduce

【SpringBoot框架篇】32.基于注解+redis实现

Elasticsearch：如何减少 Elasticsearch 集

Go redis操作

Redis面试题

专题五 Redis高并发场景

基于GBase8s和Calcite的多数据源查询

Redis——底层数据结构原理

加:2021-11-29 16:23:14 更:2021-11-29 16:24:12

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/3 22:30:36-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码