一、MapReduce
data:image/s3,"s3://crabby-images/84aad/84aadc660b97e210d1c5ef1e062d1a645b3f08f6" alt="https://img-blog.csdnimg.cn/823567368ae84eab919655d4d1661846.png" MapReduce是Google提出的一个软件架构,用于大规模数据集(大于1TB)的并行运算。概念“Map(映射)”和“Reduce(归纳)”,及他们的主要思想,都是从函数式编程语言借来的,还有从矢量编程语言借来的特性。
当前的软件实现是指定一个Map(映射)函数,用来把一组键值对映射成一组新的键值对,指定并发的Reduce(归纳)函数,用来保证所有映射的键值对中的每一个共享相同的键组。
二、MapReduce开发环境搭建
环境准备: Java, Intellij IDEA, Maven 开发环境搭建方式
java安装链接及步骤:https://www.cnblogs.com/de-ming/p/13909440.html
2.1、Maven环境
data:image/s3,"s3://crabby-images/36bb1/36bb1070804e8a493e085b607f995d188dcd6d49" alt="在这里插入图片描述" 添加依赖
https://search.maven.org/artifact/org.apache.hadoop/hadoop-client/3.1.4/jar
data:image/s3,"s3://crabby-images/d45cb/d45cb64f040c36c05c5c3560fa7affba1bbae6dd" alt="在这里插入图片描述"
data:image/s3,"s3://crabby-images/dda83/dda834fc10cae9d57bf12c3140f14e9429cc1613" alt="在这里插入图片描述"
data:image/s3,"s3://crabby-images/2a837/2a837ec41aacb093e2d9cf8701e14bc166b3ef31" alt="在这里插入图片描述" 添加源码 data:image/s3,"s3://crabby-images/35ad9/35ad957c709089920f98f0ce2656430b84368119" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/c6b66/c6b66cc4cd48572cf7256789d0cf690b1af6441d" alt="在这里插入图片描述"
2.2、手动导入Jar包
Hadoop安装包链接:https://pan.baidu.com/s/1teHwnBH2Qm6F7iWZ3q-hSQ 提取码:cgnb
新建一个java工程 data:image/s3,"s3://crabby-images/a098f/a098f326246b8988528b7318d71d3b5efdf56c7e" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/d29b7/d29b79875b3f6d342d54bca7cc934dc5c1d9fa83" alt="在这里插入图片描述" 然后,搜JobClient.class,点击’Choose Sources’ data:image/s3,"s3://crabby-images/01e86/01e866ee3f0142c9de4c3455b7f96d50d0946bd4" alt="在这里插入图片描述"
这样就OK了,可以看到JobClient.java
三、MapReduce单词计数源码分析
3.1、打开WordCount.java
打开:https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-examples/3.1.4,复制Maven里面的内容 data:image/s3,"s3://crabby-images/8f44a/8f44a5e0612d7d45badd369033138e8b6b88393b" alt="在这里插入图片描述" 粘贴到源码 data:image/s3,"s3://crabby-images/35be1/35be1cf49a2cf1770c0ca9a743eaffd5ff25d3ef" alt="在这里插入图片描述" 搜索WordCount data:image/s3,"s3://crabby-images/25e7d/25e7df2ada1538909a03272a7a9090acb596b62b" alt="在这里插入图片描述"
data:image/s3,"s3://crabby-images/1e98b/1e98b0c9997c315d4565f210e9a9746474b9b8de" alt="在这里插入图片描述"
3.2、源码分析
3.2.1、MapReduce单词计数源码 : Map任务
data:image/s3,"s3://crabby-images/8b53a/8b53a283b8ebabf313c6e7b422ca22ba6980067e" alt="在这里插入图片描述"
3.2.2、MapReduce单词计数源码 : Reduce任务
data:image/s3,"s3://crabby-images/74c9d/74c9d807966daa8c16d6308b0fd2a7102be05db8" alt="在这里插入图片描述"
3.2.3、MapReduce单词计数源码 : main 函数
设置必要参数及组装MapReduce程序data:image/s3,"s3://crabby-images/2751b/2751b02f0f17a7105cce3a91e89fb679755fc17d" alt="在这里插入图片描述"
四、MapReduce API介绍
- 一般MapReduce都是由Mapper, Reducer 及main 函数组成。
- Mapper程序一般完成键值对映射操作;
- Reducer 程序一般完成键值对聚合操作;
- Main函数则负责组装Mapper,Reducer及必要的配置;
- 高阶编程还涉及到设置输入输出文件格式、设置Combiner、Partitioner优化程序等;
4.1、MapReduce程序模块 : Main 函数
data:image/s3,"s3://crabby-images/39e18/39e18f88e53d49cb995e7d0a948f2e22014bb7e9" alt="在这里插入图片描述"
4.2、MapReduce程序模块: Mapper
- org.apache.hadoop.mapreduce.Mapper
data:image/s3,"s3://crabby-images/cf74b/cf74bbfd614840dc51ccc3e0e304abe308a0d55b" alt="在这里插入图片描述"
4.3、MapReduce程序模块: Reducer
- org.apache.hadoop.mapreduce.Reducer
data:image/s3,"s3://crabby-images/8a836/8a836e7e8db4cfac7ab72afd1b5b70edb02db426" alt="在这里插入图片描述"
五、MapReduce实例
5.1、流程(Mapper、Reducer、Main、打包运行)
- 参考WordCount程序,修改Mapper;
- 直接复制 Reducer程序;
- 直接复制Main函数,并做相应修改;
- 编译打包 ;
- 上传Jar包;
- 上传数据;
- 运行程序;
- 查看运行结果;
5.2、实例1:按日期访问统计次数:
1、参考WordCount程序,修改Mapper; (这里新建一个java程序,然后把下面(1、2、3步代码)复制到类里)
public static class SpiltMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String[] data = value.toString().split("\\|",-1);
word.set(data[1]);
context.write(word, one);
}
}
2、直接复制 Reducer程序;
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
3、直接复制Main函数,并做相应修改;
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(CountByDate.class);
job.setMapperClass(SpiltMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
4、编译打包 (jar打包)
data:image/s3,"s3://crabby-images/c312a/c312ab17853b84144c6dc1ada672358a449a644f" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/0e047/0e047e87b2741df8fdc158aa48b288a39d69dbe2" alt="在这里插入图片描述"
build出现错误及解决办法: data:image/s3,"s3://crabby-images/dbba9/dbba99e9a1d228c953de6970d6b353e1d865f211" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/3f7da/3f7daff36d91f3ac3014aa466e5dcb639490dce5" alt="在这里插入图片描述"
完成 data:image/s3,"s3://crabby-images/7a0f7/7a0f71d44db24022efb21aed69eee52777fe932b" alt="在这里插入图片描述"
5/6、上传jar包&数据 email_log_with_date.txt数据包链接:https://pan.baidu.com/s/1HfwHCfmvVdQpuL-MPtpAng 提取码:cgnb data:image/s3,"s3://crabby-images/8d379/8d379ba2f9b5cae1ba007c294e4560a5bcdc0dd9" alt="在这里插入图片描述" 上传数据包(注意开启hdfs): data:image/s3,"s3://crabby-images/ed78d/ed78dfdf9dc750459f7551295d07b6330c5e3caa" alt="在这里插入图片描述" 上传OK(浏览器:master:50070 查看) data:image/s3,"s3://crabby-images/c3cc2/c3cc2d0a534a5637ba2a9603b9ba70e12aebaf1c" alt="在这里插入图片描述"
7、运行程序 (注意开启yarn) data:image/s3,"s3://crabby-images/0b707/0b707d43ffebe4bd8a4a96ffbb737fe537a5b69b" alt="在这里插入图片描述" 上传完成后:
(master:8088 )
data:image/s3,"s3://crabby-images/4d50f/4d50f26c52c5f091a21e1b064574ac220f848350" alt="在这里插入图片描述" 8、查看结果 (master:50070 ) data:image/s3,"s3://crabby-images/84b5f/84b5f640e65fac06f8c5c989c6f51f83abc37540" alt="在这里插入图片描述"
5.3、实例2:按用户访问次数排序
Mapper、Reducer、Main程序 SortByCountFirst.Mapper
package demo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import java.io.IOException;
public class SortByCountFirst {
public static class SpiltMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String[] data = value.toString().split("\\|",-1);
word.set(data[0]);
context.write(word, one);
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: demo.SortByCountFirst <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "sort by count first ");
job.setJarByClass(SortByCountFirst.class);
job.setMapperClass(SpiltMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
SortByCountSecond.Mapper
package demo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import java.io.IOException;
public class SortByCountSecond {
public static class SpiltMapper
extends Mapper<Object, Text, IntWritable, Text> {
private IntWritable count = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
String[] data = value.toString().split("\t",-1);
word.set(data[0]);
count.set(Integer.parseInt(data[1]));
context.write(count,word);
}
}
public static class ReverseReducer
extends Reducer<IntWritable,Text,Text,IntWritable> {
public void reduce(IntWritable key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
for (Text val : values) {
context.write(val,key);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: demo.SortByCountFirst <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "sort by count first ");
job.setJarByClass(SortByCountSecond.class);
job.setMapperClass(SpiltMapper.class);
job.setReducerClass(ReverseReducer.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
然后打包上传
yarn jar sortbycount.jar demo.SortByCountSecond -Dmapreduce.job.queuename=prod email_log_with_date.txt sortbycountfirst_output00
yarn jar sortbycount.jar demo.SortByCountSecond -Dmapreduce.job.queuename=prod email_log_with_date.txt sortbycountfirst_output00 sortbycountsecond_output00
|