Hadoop3环境搭建
Hadoop3环境搭建(本地模式)
实验内容
- 在Linux中安装和配置JDK8
- 实现Hadoop3安装和配置
- Hadoop3安装测试
实验环境
硬件:Ubuntu16.04
软件:JDK1.8、Hadoop3.3.0
数据存放路径:/data/dataset
tar包路径:/data/software
tar包压缩路径:/data/bigdata
实验设计创建文件:/data/resource
实验原理
Hadoop运行模式分为3种:本地运行模式、伪分布运行模式、集群运行模式。
Hadoop3的本地模式,指的是Hadoop各组件程序都运行在同一个JVM上,不需要单独启动任何Hadoop进程。这种模式适宜用在开发阶段。
实验步骤
1.安装JDK8
Hadoop3的运行需要JDK8的支持,必须先安装JDK8环境
tar -zxvf jdk-8u73-linux-x64.tar.gz -C /data/bigdata/
vim /etc/profile
export JAVA_HOME=/data/bigdata/jdk1.8.0_73
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
java -version
2.安装Hadoop3
tar -zxvf hadoop-3.3.0.tar.gz -C /data/bigdata/
vim /etc/profile
export HADOOP_HOME=/data/bigdata/hadoop-3.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
source /etc/profile
hadoop version
3.在Hadoop3上执行MR程序
hadoop jar hadoop-mapreduce-examples-3.3.0.jar pi 10 20
Number of Maps = 10
Samples per Map = 20
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2022-03-29 15:58:10,062 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2022-03-29 15:58:10,130 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2022-03-29 15:58:10,130 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2022-03-29 15:58:10,344 INFO input.FileInputFormat: Total input files to process : 10
2022-03-29 15:58:10,354 INFO mapreduce.JobSubmitter: number of splits:10
2022-03-29 15:58:10,470 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1599205954_0001
2022-03-29 15:58:10,471 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-03-29 15:58:10,580 INFO mapreduce.Job: The url to track the job: http:
2022-03-29 15:58:10,581 INFO mapreduce.Job: Running job: job_local1599205954_0001
2022-03-29 15:58:10,584 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2022-03-29 15:58:10,599 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-03-29 15:58:10,599 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-03-29 15:58:10,600 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2022-03-29 15:58:10,675 INFO mapred.LocalJobRunner: Waiting for map tasks
2022-03-29 15:58:10,676 INFO mapred.LocalJobRunner: Starting task: attempt_local1599205954_0001_m_000000_0
2022-03-29 15:58:10,711 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-03-29 15:58:10,711 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-03-29 15:58:10,731 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2022-03-29 15:58:10,736 INFO mapred.MapTask: Processing split: file:/data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce/QuasiMonteCarlo_1648540689571_913688259/in/part0:0+118
2022-03-29 15:58:10,794 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2022-03-29 15:58:10,794 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2022-03-29 15:58:10,795 INFO mapred.MapTask: soft limit at 83886080
2022-03-29 15:58:10,795 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2022-03-29 15:58:10,795 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2022-03-29 15:58:10,799 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2022-03-29 15:58:10,815 INFO mapred.LocalJobRunner:
2022-03-29 15:58:10,815 INFO mapred.MapTask: Starting flush of map output
2022-03-29 15:58:10,815 INFO mapred.MapTask: Spilling map output
2022-03-29 15:58:10,815 INFO mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 104857600
2022-03-29 15:58:10,815 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
2022-03-29 15:58:10,820 INFO mapred.MapTask: Finished spill 0
2022-03-29 15:58:10,833 INFO mapred.Task: Task:attempt_local1599205954_0001_m_000000_0 is done. And is in the process of committing
2022-03-29 15:58:10,834 INFO mapred.LocalJobRunner: map
2022-03-29 15:58:10,835 INFO mapred.Task: Task 'attempt_local1599205954_0001_m_000000_0' done.
2022-03-29 15:58:10,842 INFO mapred.Task: Final Counters for attempt_local1599205954_0001_m_000000_0: Counters: 17
File System Counters
FILE: Number of bytes read=283282
FILE: Number of bytes written=895462
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=2
Map output bytes=18
Map output materialized bytes=28
Input split bytes=168
Combine input records=0
Spilled Records=2
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=2058354688
File Input Format Counters
Bytes Read=130
2022-03-29 15:58:10,843 INFO mapred.LocalJobRunner: Finishing task: attempt_local1599205954_0001_m_000000_0
2022-03-29 15:58:10,843 INFO mapred.LocalJobRunner: Starting task: attempt_local1599205954_0001_m_000001_0
2022-03-29 15:58:10,844 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-03-29 15:58:10,844 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-03-29 15:58:10,845 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2022-03-29 15:58:10,847 INFO mapred.MapTask: Processing split: file:/data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce/QuasiMonteCarlo_1648540689571_913688259/in/part9:0+118
2022-03-29 15:58:10,887 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2022-03-29 15:58:10,887 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2022-03-29 15:58:10,887 INFO mapred.MapTask: soft limit at 83886080
2022-03-29 15:58:10,887 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2022-03-29 15:58:10,887 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2022-03-29 15:58:10,888 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2022-03-29 15:58:10,890 INFO mapred.LocalJobRunner:
2022-03-29 15:58:10,890 INFO mapred.MapTask: Starting flush of map output
2022-03-29 15:58:10,890 INFO mapred.MapTask: Spilling map output
2022-03-29 15:58:10,890 INFO mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 104857600
2022-03-29 15:58:10,890 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
2022-03-29 15:58:10,892 INFO mapred.MapTask: Finished spill 0
2022-03-29 15:58:10,893 INFO mapred.Task: Task:attempt_local1599205954_0001_m_000001_0 is done. And is in the process of committing
2022-03-29 15:58:10,894 INFO mapred.LocalJobRunner: map
2022-03-29 15:58:10,894 INFO mapred.Task: Task 'attempt_local1599205954_0001_m_000001_0' done.
2022-03-29 15:58:10,895 INFO mapred.Task: Final Counters for attempt_local1599205954_0001_m_000001_0:
......
Counters: 24
File System Counters
FILE: Number of bytes read=296069
FILE: Number of bytes written=896467
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=20
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=0
Total committed heap usage (bytes)=2595749888
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=109
2022-03-29 15:58:11,315 INFO mapred.LocalJobRunner: Finishing task: attempt_local1599205954_0001_r_000000_0
2022-03-29 15:58:11,315 INFO mapred.LocalJobRunner: reduce task executor complete.
2022-03-29 15:58:11,586 INFO mapreduce.Job: Job job_local1599205954_0001 running in uber mode : false
2022-03-29 15:58:11,588 INFO mapreduce.Job: map 100% reduce 100%
2022-03-29 15:58:11,590 INFO mapreduce.Job: Job job_local1599205954_0001 completed successfully
2022-03-29 15:58:11,610 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=3197910
FILE: Number of bytes written=9853787
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1680
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=42
Total committed heap usage (bytes)=25328877568
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1300
File Output Format Counters
Bytes Written=109
Job Finished in 1.637 seconds
Estimated value of Pi is 3.12000000000000000000
root@instance-00001816:/data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce#
Hadoop3环境搭建(伪分布模式)
掌握Hadoop3伪分布模式环境的搭建、文件配置和环境测试。
实验内容
- 在Linux中安装和配置JDK8
- Hadoop3安装和配置(伪分布模式)
- Hadoop3安装测试
实验环境
硬件:Ubuntu16.04
软件:JDK-1.8、Hadoop-3.3
数据存放路径:/data/dataset
tar包路径:/data/software
tar包压缩路径:/data/bigdata
软件安装路径:/opt
实验设计创建文件:/data/resource
实验原理
Hadoop的运行模式分为3种:本地运行模式,伪分布运行模式,集群运行模式。
伪分布模式是将守护程序运行在本地主机,模拟一个小规模集群。这里不但需要本地模式的操作过程,需要对配置文件进行配置
实验步骤
1.安装JDK8
tar -zxvf jdk-8u73-linux-x64.tar.gz -C /data/bigdata/
vim /etc/profile
export JAVA_HOME=/data/bigdata/jdk1.8.0_73
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
java -version
2.配置SSH免密登录
Hadoop在启动过程中是通过SSH远程操作的,所以在不做特殊配置下每次启动到相关节点时都要输入密码。
如果想避免每次都输入密码,需要设置免密登录。
rm -rf ~/.ssh
ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3.安装Hadoop
tar -zxvf hadoop-3.3.0.tar.gz -C /data/bigdata/
vim /etc/profile
export HADOOP_HOME=/data/bigdata/hadoop-3.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
source /etc/profile
hadoop version
4.Hadoop伪分布式配置
Hadoop共需要配置5个文件(可选地配置workers文件)均位于Hadoop安装目录下的etc/hadoop/ 子目录下
cd /data/bigdata/hadoop-3.3.0/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/data/bigdata/jdk1.8.0_73
vim core-site.xml
1. <configuration>
2. <property>
3. <name>fs.defaultFS</name>
4. <value>hdfs://localhost:9000/</value>
5. </property>
6. <property>
7. <name>hadoop.tmp.dir</name>
8. <value>/data/bigdata/hadoop-3.3.0/tmp</value>
9. </property>
10. <property>
11. <name>hadoop.proxyuser.hduser.hosts</name>
12. <value>*</value>
13. </property>
14. <property>
15. <name>hadoop.proxyuser.hduser.groups</name>
16. <value>*</value>
17. </property>
18. </configuration>
vim hdfs-site.xml
1. <configuration>
2. <property>
3. <name>dfs.replication</name>
4. <value>1</value>
5. </property>
6. <property>
7. <name>dfs.datanode.max.transfer.threads</name>
8. <value>4096</value>
9. </property>
10. </configuration>
vim mapred-site.xml
1. <configuration>
2. <property>
3. <name>mapreduce.framework.name</name>
4. <value>yarn</value>
5. </property>
6. <property>
7. <name>mapreduce.application.classpath</name> <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
8. </property>
9. </configuration>
vim yarn-site.xml
1. <configuration>
2. <property>
3. <name>yarn.nodemanager.aux-services</name>
4. <value>mapreduce_shuffle</value>
5. </property>
6. <property>
7. <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
8. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
9. </property>
10. <property>
11. <name>yarn.resourcemanager.hostname</name>
12. <value>localhost</value>
13. </property>
14. </configuration>
5.格式化文件系统
格式化HDFS(仅需执行格式化一次),在终端窗口执行下面的命令
cd /data/bigdata/hadoop-3.3.0/bin/
/hdfs namenode -format
如果因为某些原因需要从头重新配置集群,那么在重新格式化HDFS之前,先把Haoop下的tmp目录删除。 这个目录是在hdfs-site.xml文件中自己指定的,其下有两个子目录name和data,重新格式化之前必须删除它们。
格式化namenode,实际上就是在namenode上创建一块命名空间。在创建过程中,会加载所配置的文件,检验是否配置正确。
6.在Hadoop3集群上执行MapReduce程序
1. $ cd /data/bigdata/hadoop-3.3.0/sbin
2. $ vi start-dfs.sh
3. $ vi stop-dfs.sh
1. HDFS_DATANODE_USER=root
2. HDFS_SECURE_DN_USER=hdfs
3. HDFS_NAMENODE_USER=root
4. HDFS_SECONDARYNAMENODE_USER=root
/start-dfs.sh
jps
1. $ cd /data/bigdata/hadoop-3.3.0/sbin
2. $ vi start-yarn.sh
3. $ vi stop-yarn.sh
1. YARN_RESOURCEMANAGER_USER=root
2. HADOOP_SECURE_DN_USER=yarn
3. YARN_NODEMANAGER_USER=root
/start-yarn.sh
jps
cd ../bin/
./mapred --daemon start historyserver
./yarn --daemon start timelineserver
cd /data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-3.3.0.jar pi 1 2
打开浏览器,在地址栏输入:http://localhost:8088
查看任务进度:http://localhost:8088/cluster
在 Web 界面点击 “Tracking UI” 这一列的 History 连接,可以看到任务的运行信息。
同样,URL中的cda是我的机器名,请替换为你自己的机器名。
1. $ cd /data/bigdata/hadoop-3.3.0/sbin
2. $ ./stop-yarn.sh
3. $ ./stop-dfs.sh
4. $ cd ../bin
5. $ ./mapred --daemon stop historyserver
6. $ ./yarn --daemon stop timeline
|