IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> Hadoop3环境搭建 -> 正文阅读

[大数据]Hadoop3环境搭建

Hadoop3环境搭建


Hadoop3环境搭建(本地模式)

实验内容

  1. 在Linux中安装和配置JDK8
  2. 实现Hadoop3安装和配置
  3. Hadoop3安装测试

实验环境

硬件:Ubuntu16.04

软件:JDK1.8、Hadoop3.3.0

数据存放路径:/data/dataset

tar包路径:/data/software

tar包压缩路径:/data/bigdata

实验设计创建文件:/data/resource

实验原理

Hadoop运行模式分为3种:本地运行模式、伪分布运行模式、集群运行模式。

Hadoop3的本地模式,指的是Hadoop各组件程序都运行在同一个JVM上,不需要单独启动任何Hadoop进程。这种模式适宜用在开发阶段。

实验步骤

1.安装JDK8

Hadoop3的运行需要JDK8的支持,必须先安装JDK8环境

# 1.JDK安装包解压
tar -zxvf jdk-8u73-linux-x64.tar.gz  -C /data/bigdata/
# 2.配置环境变量
vim /etc/profile
export JAVA_HOME=/data/bigdata/jdk1.8.0_73
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
# 3.测试JDK安装
java -version
2.安装Hadoop3
# 1.Hadoop3安装包解压
tar -zxvf hadoop-3.3.0.tar.gz -C /data/bigdata/
# 2.配置环境变量
vim /etc/profile
export HADOOP_HOME=/data/bigdata/hadoop-3.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
source /etc/profile
# 3.测试Hadoop3安装
hadoop version
3.在Hadoop3上执行MR程序
hadoop jar hadoop-mapreduce-examples-3.3.0.jar pi 10 20
Number of Maps  = 10
Samples per Map = 20
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2022-03-29 15:58:10,062 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2022-03-29 15:58:10,130 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2022-03-29 15:58:10,130 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2022-03-29 15:58:10,344 INFO input.FileInputFormat: Total input files to process : 10
2022-03-29 15:58:10,354 INFO mapreduce.JobSubmitter: number of splits:10
2022-03-29 15:58:10,470 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1599205954_0001
2022-03-29 15:58:10,471 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-03-29 15:58:10,580 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2022-03-29 15:58:10,581 INFO mapreduce.Job: Running job: job_local1599205954_0001
2022-03-29 15:58:10,584 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2022-03-29 15:58:10,599 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-03-29 15:58:10,599 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-03-29 15:58:10,600 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2022-03-29 15:58:10,675 INFO mapred.LocalJobRunner: Waiting for map tasks
2022-03-29 15:58:10,676 INFO mapred.LocalJobRunner: Starting task: attempt_local1599205954_0001_m_000000_0
2022-03-29 15:58:10,711 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-03-29 15:58:10,711 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-03-29 15:58:10,731 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2022-03-29 15:58:10,736 INFO mapred.MapTask: Processing split: file:/data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce/QuasiMonteCarlo_1648540689571_913688259/in/part0:0+118
2022-03-29 15:58:10,794 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2022-03-29 15:58:10,794 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2022-03-29 15:58:10,795 INFO mapred.MapTask: soft limit at 83886080
2022-03-29 15:58:10,795 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2022-03-29 15:58:10,795 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2022-03-29 15:58:10,799 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2022-03-29 15:58:10,815 INFO mapred.LocalJobRunner: 
2022-03-29 15:58:10,815 INFO mapred.MapTask: Starting flush of map output
2022-03-29 15:58:10,815 INFO mapred.MapTask: Spilling map output
2022-03-29 15:58:10,815 INFO mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 104857600
2022-03-29 15:58:10,815 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
2022-03-29 15:58:10,820 INFO mapred.MapTask: Finished spill 0
2022-03-29 15:58:10,833 INFO mapred.Task: Task:attempt_local1599205954_0001_m_000000_0 is done. And is in the process of committing
2022-03-29 15:58:10,834 INFO mapred.LocalJobRunner: map
2022-03-29 15:58:10,835 INFO mapred.Task: Task 'attempt_local1599205954_0001_m_000000_0' done.
2022-03-29 15:58:10,842 INFO mapred.Task: Final Counters for attempt_local1599205954_0001_m_000000_0: Counters: 17
        File System Counters
                FILE: Number of bytes read=283282
                FILE: Number of bytes written=895462
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1
                Map output records=2
                Map output bytes=18
                Map output materialized bytes=28
                Input split bytes=168
                Combine input records=0
                Spilled Records=2
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=2058354688
        File Input Format Counters 
                Bytes Read=130
2022-03-29 15:58:10,843 INFO mapred.LocalJobRunner: Finishing task: attempt_local1599205954_0001_m_000000_0
2022-03-29 15:58:10,843 INFO mapred.LocalJobRunner: Starting task: attempt_local1599205954_0001_m_000001_0
2022-03-29 15:58:10,844 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-03-29 15:58:10,844 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-03-29 15:58:10,845 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2022-03-29 15:58:10,847 INFO mapred.MapTask: Processing split: file:/data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce/QuasiMonteCarlo_1648540689571_913688259/in/part9:0+118
2022-03-29 15:58:10,887 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2022-03-29 15:58:10,887 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2022-03-29 15:58:10,887 INFO mapred.MapTask: soft limit at 83886080
2022-03-29 15:58:10,887 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2022-03-29 15:58:10,887 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2022-03-29 15:58:10,888 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2022-03-29 15:58:10,890 INFO mapred.LocalJobRunner: 
2022-03-29 15:58:10,890 INFO mapred.MapTask: Starting flush of map output
2022-03-29 15:58:10,890 INFO mapred.MapTask: Spilling map output
2022-03-29 15:58:10,890 INFO mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 104857600
2022-03-29 15:58:10,890 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
2022-03-29 15:58:10,892 INFO mapred.MapTask: Finished spill 0
2022-03-29 15:58:10,893 INFO mapred.Task: Task:attempt_local1599205954_0001_m_000001_0 is done. And is in the process of committing
2022-03-29 15:58:10,894 INFO mapred.LocalJobRunner: map
2022-03-29 15:58:10,894 INFO mapred.Task: Task 'attempt_local1599205954_0001_m_000001_0' done.
2022-03-29 15:58:10,895 INFO mapred.Task: Final Counters for attempt_local1599205954_0001_m_000001_0:
......
Counters: 24
        File System Counters
                FILE: Number of bytes read=296069
                FILE: Number of bytes written=896467
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=280
                Reduce input records=20
                Reduce output records=0
                Spilled Records=20
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=2595749888
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Output Format Counters 
                Bytes Written=109
2022-03-29 15:58:11,315 INFO mapred.LocalJobRunner: Finishing task: attempt_local1599205954_0001_r_000000_0
2022-03-29 15:58:11,315 INFO mapred.LocalJobRunner: reduce task executor complete.
2022-03-29 15:58:11,586 INFO mapreduce.Job: Job job_local1599205954_0001 running in uber mode : false
2022-03-29 15:58:11,588 INFO mapreduce.Job:  map 100% reduce 100%
2022-03-29 15:58:11,590 INFO mapreduce.Job: Job job_local1599205954_0001 completed successfully
2022-03-29 15:58:11,610 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=3197910
                FILE: Number of bytes written=9853787
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=10
                Map output records=20
                Map output bytes=180
                Map output materialized bytes=280
                Input split bytes=1680
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=280
                Reduce input records=20
                Reduce output records=0
                Spilled Records=40
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=42
                Total committed heap usage (bytes)=25328877568
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1300
        File Output Format Counters 
                Bytes Written=109
Job Finished in 1.637 seconds
Estimated value of Pi is 3.12000000000000000000
root@instance-00001816:/data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce# 

Hadoop3环境搭建(伪分布模式)

掌握Hadoop3伪分布模式环境的搭建、文件配置和环境测试。

实验内容

  1. 在Linux中安装和配置JDK8
  2. Hadoop3安装和配置(伪分布模式)
  3. Hadoop3安装测试

实验环境

硬件:Ubuntu16.04

软件:JDK-1.8、Hadoop-3.3

数据存放路径:/data/dataset

tar包路径:/data/software

tar包压缩路径:/data/bigdata

软件安装路径:/opt

实验设计创建文件:/data/resource

实验原理

Hadoop的运行模式分为3种:本地运行模式,伪分布运行模式,集群运行模式。

伪分布模式是将守护程序运行在本地主机,模拟一个小规模集群。这里不但需要本地模式的操作过程,需要对配置文件进行配置

实验步骤

1.安装JDK8
# 1.JDK安装包解压
tar -zxvf jdk-8u73-linux-x64.tar.gz  -C /data/bigdata/
# 2.配置环境变量
vim /etc/profile
export JAVA_HOME=/data/bigdata/jdk1.8.0_73
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
# 3.测试JDK安装
java -version
2.配置SSH免密登录

Hadoop在启动过程中是通过SSH远程操作的,所以在不做特殊配置下每次启动到相关节点时都要输入密码。

如果想避免每次都输入密码,需要设置免密登录。

# 1.生成密钥
rm -rf ~/.ssh #若设备已经提前配置需要先删除
ssh-keygen #在目录~/.ssh/下会生成两个文件:id rsa私钥、id_rsa.pub公钥
# 2.将公钥写入认证文件中,则下次登录可以实现免密登录
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
3.安装Hadoop
# 1.Hadoop3安装包解压
tar -zxvf hadoop-3.3.0.tar.gz -C /data/bigdata/
# 2.配置环境变量
vim /etc/profile
export HADOOP_HOME=/data/bigdata/hadoop-3.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
source /etc/profile
# 3.测试Hadoop3安装
hadoop version
4.Hadoop伪分布式配置

Hadoop共需要配置5个文件(可选地配置workers文件)均位于Hadoop安装目录下的etc/hadoop/子目录下

cd /data/bigdata/hadoop-3.3.0/etc/hadoop
# 1.配置hadoop-env.sh文件
vim hadoop-env.sh #找到并修改JAVA_HOME属性的值
export JAVA_HOME=/data/bigdata/jdk1.8.0_73

# 2.配置core-site.xml文件
vim core-site.xml
1.	<configuration>
2.	    <property>
3.	        <name>fs.defaultFS</name>
4.	        <value>hdfs://localhost:9000/</value>
5.	    </property>
6.	    <property>
7.	        <name>hadoop.tmp.dir</name>
8.	        <value>/data/bigdata/hadoop-3.3.0/tmp</value>
9.	    </property>
10.	    <property>
11.	        <name>hadoop.proxyuser.hduser.hosts</name>
12.	        <value>*</value>
13.	    </property>
14.	    <property>
15.	        <name>hadoop.proxyuser.hduser.groups</name>
16.	        <value>*</value>
17.	    </property>
18.	</configuration>

# 3.配置hdfs-site.xml文件
vim hdfs-site.xml
1.	<configuration>
2.	    <property>
3.	        <name>dfs.replication</name>
4.	        <value>1</value>
5.	    </property>
6.	    <property>
7.	        <name>dfs.datanode.max.transfer.threads</name>
8.	        <value>4096</value>
9.	    </property>
10.	</configuration>

# 4.配置mapred-site.xml文件
vim mapred-site.xml
1.	<configuration>
2.	    <property>
3.	        <name>mapreduce.framework.name</name>
4.	        <value>yarn</value>
5.	    </property>
6.	    <property>
7.	        <name>mapreduce.application.classpath</name> <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
8.	    </property>
9.	</configuration>

# 5.配置yarn-site.xml文件
vim yarn-site.xml
1.	<configuration>
2.	    <property>
3.	        <name>yarn.nodemanager.aux-services</name>
4.	        <value>mapreduce_shuffle</value>
5.	    </property>
6.	    <property>
7.	        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
8.	        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
9.	    </property>
10.	    <property>
11.	        <name>yarn.resourcemanager.hostname</name>
12.	        <value>localhost</value>
13.	    </property>
14.	</configuration>
5.格式化文件系统

格式化HDFS(仅需执行格式化一次),在终端窗口执行下面的命令

cd /data/bigdata/hadoop-3.3.0/bin/
/hdfs namenode -format

如果因为某些原因需要从头重新配置集群,那么在重新格式化HDFS之前,先把Haoop下的tmp目录删除。 这个目录是在hdfs-site.xml文件中自己指定的,其下有两个子目录name和data,重新格式化之前必须删除它们。

格式化namenode,实际上就是在namenode上创建一块命名空间。在创建过程中,会加载所配置的文件,检验是否配置正确。

6.在Hadoop3集群上执行MapReduce程序
# 1.修改启动文件
1.	$ cd /data/bigdata/hadoop-3.3.0/sbin
2.	$ vi start-dfs.sh
3.	$ vi stop-dfs.sh
# 分别添加内容
1.	HDFS_DATANODE_USER=root
2.	HDFS_SECURE_DN_USER=hdfs
3.	HDFS_NAMENODE_USER=root
4.	HDFS_SECONDARYNAMENODE_USER=root
# 2.首先启动HDFS集群
/start-dfs.sh
jps #使用jps命令查看当前节点上运行的服务
# 3.成功启动后,可以通过Web界面查看NameNode和Datanode信息和HDFS文件系统。NameNode Web接口:http://localhost:9870
# 4.修改启动文件
1.	$ cd /data/bigdata/hadoop-3.3.0/sbin
2.	$ vi start-yarn.sh
3.	$ vi stop-yarn.sh
# 分别添加内容:
1.	YARN_RESOURCEMANAGER_USER=root
2.	HADOOP_SECURE_DN_USER=yarn
3.	YARN_NODEMANAGER_USER=root
# 5.启动yarn:
/start-yarn.sh
jps
# 6.启动historyserver历史服务器和timelineserver时间线服务器
cd ../bin/
./mapred --daemon start historyserver
./yarn --daemon start timelineserver
# 7.运行pi程序,先进入到程序示例.jar包所在的目录,然后运行MR程序
cd /data/bigdata/hadoop-3.3.0/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-3.3.0.jar pi 1 2 #在输出内容中,可以找到计算出的PI值
# 8.可以通过 Web 界面查看
打开浏览器,在地址栏输入:http://localhost:8088
查看任务进度:http://localhost:8088/cluster 
在 Web 界面点击 “Tracking UI” 这一列的 History 连接,可以看到任务的运行信息。
同样,URL中的cda是我的机器名,请替换为你自己的机器名。
# 9.关闭集群
1.	$ cd /data/bigdata/hadoop-3.3.0/sbin
2.	$ ./stop-yarn.sh
3.	$ ./stop-dfs.sh
4.	$ cd ../bin
5.	$ ./mapred --daemon stop historyserver
6.	$ ./yarn --daemon stop timeline
  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2022-03-30 18:32:03  更:2022-03-30 18:33:14 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/24 5:33:07-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码