IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> Hadoop HA 集群部署(以三台为例),zookeeper集群管理 -> 正文阅读

[大数据]Hadoop HA 集群部署(以三台为例),zookeeper集群管理

1.zookeeper集群的安装

前置条件:jdk安装配置完成、免密自连,虚拟机优化(参考文章http://t.csdn.cn/spiKA

解压:[root@master01 download]# tar -zxf apache-zookeeper-3.6.3-bin.tar.gz -C /opt/software/

重命名:[root@master01 software]# mv apache-zookeeper-3.6.3-bin zookeeper-3.6.3

新建数据存储目录:[root@master01 zookeeper-3.6.3]# mkdir data

重命名zoo_sample.cfg文件:[root@master01 conf]# mv zoo_sample.cfg zoo.cfg

?修改zoo.cfg文件配置:

????????修改dataDir的路径为创建的data

? ? ? ? 在clientPort下方增加:

server.1=master01:2888:3888
server.2=master02:2888:3888
server.3=worker01:2888:3888

?在创建的data目录下新建myid文件

[root@master01 data]# vim myid

(master01为1,master02为2,worker01为3)

?配置并激活环境变量

[root@master01 data]# vim /etc/profile.d/my.sh

?source /etc/profile

分发zookeeper-3.6.3文件到master02及worker01

分发完成注意修改各主机对应的myid文件

启动zookeeper集群,zkServer.sh start(在三台机器上都执行此命令)

jps查看进程,三台机器都有QuorumPeerMain即表示成功

zookeeper集群搭建成功!

2.hadoop集群的搭建

2.1.解压

在root目录下的/opt/目录下新建一个download目录用于存放压缩包,一个software目录用于解压文件

将保存在/opt/download/目录下的hadoop-3.1.3.tar.gz压缩文件解压至/opt/software/目录下

tar -zxvf /opt/download/hadoop-3.1.3.tar.gz -C /opt/software

2.2.配置环境变量并激活

vim /etc/profile.d/my.sh

# hadoop
export HADOOP_HOME=/opt/software/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

激活:source /etc/profile

2.3.新建hadoop数据存储目录

[root@master01 ~]# cd /opt/software/hadoop-3.1.3/
[root@master01 hadoop-3.1.3]#?mkdir data

2.4.hadoop 内部文件配置

cd /opt/software/hadoop-3.1.3/etc/hadoop

1.hadoop-env.sh(java依赖)

vim hadoop-env.sh

找到export JAVA_HOME取消此行注释并修改为本地配置的jdk

export JAVA_HOME=/opt/software/jdk1.8.0_171

2.新建workers

vim workers

master01
master02
worker01

(须进行主机名的映射)

[root@master01 ~]# vim /etc/hosts


192.168.xxx.xxx master01
192.168.xxx.xxx master02
192.168.xxx.xxx worker01

3.配置core-site.xml

<!-- vim core-site.xml -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
    <description>逻辑名称,必须hdfs-site.xml中dfs.nameservices值保持一致</description>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop/mycluster</value>
    <description>namenode上本地的hadoop临时文件夹</description>
</property>
<property>
    <name>hadoop.http.staticuser.user</name>
    <value>root</value>
</property>
<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>
<property>
    <name>io.file.buffer.size</name>
    <value>1048576?</value>
    <description>Size of read/write SequenceFiles buffer: 128K</description>
</property>
<property>
    <name>ha.zookeeper.quorum</name>
    <value>master01:2181,master02:2181,worker01:2181</value>
</property>
<property>
    <name>hadoop.zk.address</name>
    <value>master01:2181,master02:2181,worker01:2181</value>
</property>
<property>
    <name>ha.zookeeper.session-timeout.ms</name>
    <value>10000</value>
    <description>hadoop链接zookeeper的超时时长设置ms</description>
</property>

4.配置hdfs-site.xml

<!-- vim hdfs-site.xml -->
<property>
    <name>dfs.replication</name>
    <value>2</value>
    <description>Hadoop中每个block的备份数</description>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>/opt/software/hadoop-3.1.3/data/dfs/name</value>
    <description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>/opt/software/hadoop-3.1.3/data/dfs/data</value>
    <description>datanode上数据块的物理存储位置</description>
</property>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>master01:9869</value>
</property>
<property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
    <description>指定hdfs的nameservice,需要和core-site.xml中的保持一致</description>
</property>
<property>
    <name>dfs.ha.namenodes.mycluster</name>
    <value>nn1,nn2</value>
    <description>mycluster为集群逻辑名称,映射两个namenode逻辑名称</description>
</property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>master01:8020</value>
    <description>master01的RPC通信地址</description>
</property>
<property>
    <name>dfs.namenode.http-address.mycluster.nn1</name>
    <value>master01:9870</value>
    <description>master01的http通信地址</description>
</property>
<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>master02:8020</value>
    <description>master02的RPC通信地址</description>
</property>
<property>
    <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>master02:9870</value>
    <description>master02的http通信地址</description>
</property>
<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://master01:8485;master02:8485;worker01:8485/mycluster</value>
    <description>指定NameNode的edits元数据的共享存储位置(JournalNode列表)</description>
</property>
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/tmp/hadoop/journaldata</value>
    <description>指定JournalNode在本地磁盘存放数据的位置</description>
</property>

<!--容错-->
<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
    <description>开启NameNode失败自动切换</description>
</property>
<property>
    <name>dfs.client.failover.proxy.provider.kgccluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    <description>配置失败自动切换实现方式</description>
</property>
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
    <description>脑裂处理</description>
</property>
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
    <description>使用sshfence隔离机制时,需要ssh免密登陆</description>
</property>

<!--权限设定避免因权限问题导致操作失败异常-->
<property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
    <description>关闭权限验证</description>
</property>

<!--限流将更多的内存和带宽让给job-->
<property>
    <name>dfs.image.transfer.bandwidthPerSec</name>
    <value>1048576</value>
</property>
<property>
    <name>dfs.block.scanner.volume.bytes.per.second</name>
    <value>1048576</value>
</property>
<property>
    <name>dfs.datanode.balance.bandwidthPerSec</name>
    <value>20m</value>
</property>

5.配置mapred-site.xml

<!-- vim mapred-site.xml -->
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    <description>job执行框架:local, classic or yarn.</description>
    <final>true</final>
</property>
<property>
    <name>mapreduce.application.classpath</name>
    <value>/opt/software/hadoop-3.1.3/etc/hadoop:/opt/software/hadoop-3.1.3/share/hadoop/common/lib/*:/opt/software/hado op-3.1.3/share/hadoop/common/*:/opt/software/hadoop-3.1.3/share/hadoop/hdfs:/opt/software/hadoop-3.1.3/share/ hadoop/hdfs/lib/*:/opt/software/hadoop-3.1.3/share/hadoop/hdfs/*:/opt/software/hadoop-3.1.3/share/hadoop/mapr educe/lib/*:/opt/software/hadoop-3.1.3/share/hadoop/mapreduce/*:/opt/software/hadoop-3.1.3/share/hadoop/yarn: /opt/software/hadoop-3.1.3/share/hadoop/yarn/lib/*:/opt/software/hadoop-3.1.3/share/hadoop/yarn/*</value>
</property>
<!--job history单节点配置即可-->
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>master01:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master01:19888</value>
</property>
<!--Container内存上限,由nodemanager读取并控制,实际使用超出时会被nodemanager kill Connection reset by peer-->
<property>
    <name>mapreduce.map.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>2048</value>
</property>

注:mapreduce.application.classpath的value

hadoop classpath即可得出

6. 配置yarn-site.xml

注意:各个主机对应其自身的Node Manager Config配置,即master01,maser02,worker01

分发完成注意修改!

<!-- 容错 -->
<property>
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>10000</value>
</property>
<property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>

<!-- ResourceManager重启容错 -->
<property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
    <description>RM 重启过程中不影响正在运行的作业</description>
</property>

<!-- 应用的状态信息存储方案:ZK -->
<property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    <description>应用的状态等信息保存方式:ha只支持ZKRMStateStore</description>
</property>

<!-- yarn集群配置 -->
<property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>mycluster</value>
</property>
<property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
    <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
    <value>true</value>
</property>

<!-- rm1 configs -->
<property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>master01</value>
</property>
<property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>master01:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>master01:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.https.address.rm1</name>
    <value>master01:8090</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>master01:8088</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>master01:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>master01:8033</value>
</property>


<!-- rm2 configs -->
<property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>master02</value>
</property>
<property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>master02:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>master02:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.https.address.rm2</name>
    <value>master02:8090</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>master02:8088</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>master02:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>master02:8033</value>
</property>


<!-- Node Manager Configs 每个节点都要配置 -->
<property>
    <description>Address where the localizer IPC is. ********* </description>
    <name>yarn.nodemanager.localizer.address</name>
    <value>master01:8040</value>
</property>
<property>
    <description>Address where the localizer IPC is. ********* </description>
    <name>yarn.nodemanager.address</name>
    <value>master01:8050</value>
</property>
<property>
    <description>NM Webapp address. ********* </description>
    <name>yarn.nodemanager.webapp.address</name>
    <value>master01:8042</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/tmp/hadoop/yarn/local</value>
</property>
<property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/tmp/hadoop/yarn/log</value>
</property>


<!--资源优化-->
<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
</property>
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>2</value>
</property>
<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>2048</value>
</property>

<!--日志聚合-->
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
<property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>86400</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <name>yarn.application.classpath</name>
    <value>/opt/software/hadoop-3.1.3/etc/hadoop:/opt/software/hadoop-3.1.3/share/hadoop/common/lib/*:/opt/software/hadoop-3.1.3/share/hadoop/common/*:/opt/software/hadoop-3.1.3/share/hadoop/hdfs:/opt/software/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/opt/software/hadoop-3.1.3/share/hadoop/hdfs/*:/opt/software/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/opt/software/hadoop-3.1.3/share/hadoop/mapreduce/*:/opt/software/hadoop-3.1.3/share/hadoop/yarn:/opt/software/hadoop-3.1.3/share/hadoop/yarn/lib/*:/opt/software/hadoop-3.1.3/share/hadoop/yarn/*</value>
</property>

7.?分发

将配置好的hadoop-3.1.3文件分发到集群的另两台机器上,注意修改yarn-site.xml文件

2.5.初始化集群并启动

1.启动zookeeper集群(hadoop高可用是基于zookeeper管理的)

zkServer.sh start? ?三台主机都需要开启

2.查看zookeeper状态??# 1 leader + 2 followers (一个leader,两个follower)

zkServer.sh status? ? ?

3.启动 journalnode 集群,三台主机都需要开启
?? ?hdfs --daemon start journalnode? ? ?# *3? ?

三台机器都出现JournalNode??

4.格式化zkfc(master01即可)
?? ?hdfs zkfc -formatZK?

Successfully 即表示成功!?

5.主NN节点格式化(master01执行即可)
?? ?hdfs namenode -format

完成后此时可以启动集群start-all.sh 但此时master02并没有同步master01,即此时master02没有namenode服务,下面需要进行master02的namenode的同步与启动

可以start-all.sh 启动集群,jps查看服务

6.从NN节点格式化和启动(master02),首次启用,只执行一次
?? ?hdfs namenode -bootstrapStandby
?? ?hdfs --daemon start namenode

7.启动集群
?? ?start-all.sh(即start-dfs.sh 和 start-yarn.sh)

8.查看服务

9.web端查看

web端的端口号为9870

浏览器输入? 192.168.xxx.xxx:9870即可进入

想要利用主机名需要在本地进行ip映射

在文件最后添加主机名ip的映射(需要以管理员权限才能修改此文件)

?

10.验证高可用
?? ?关闭主节点(master01)的namenode,查看master02的namenode是否被激活(standby->active)
?? ?hdfs --daemon stop namenode
?? ?
?? ?重启主节点(master01)的namenode,状态为standby
?? ?hdfs --daemon start namenode

End

  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2022-10-08 20:48:45  更:2022-10-08 20:50:09 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年5日历 -2025/5/11 7:33:52-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码