目录
准备工作
新建hadoop用户以及用户组,并赋予sudo免密码权限
新建hadoop用户
将hadoop用户加入到hadoop用户组
把hadoop用户赋予root权限,让他可以使用sudo命令
hadoop用户配置免密登录
目录规划
下载、解压
配置环境变量
配置hadoop
配置hadoop运行环境
配置core-site.xml
配置hdfs-site.xml
配置mapred-site.xml
配置yarn-site.xml
配置workers
配置fair-scheduler
复制Hadoop配置好的包到其他4台服务器
启动
启动zookeeper集群
启动journalnode
格式化HDFS(仅第一次启动执行)
格式化zkfc(仅第一次启动执行)
启动hadoop集群
验证
web访问
HA的验证
注意
准备工作
- 配置主机名
- 配置IP和主机名之间映射关系
- 配置ssh免密码登录
- 配置防火墙
- 安装jdk
- 安装zookeeper-3.5.9
新建hadoop用户以及用户组,并赋予sudo免密码权限
新建hadoop用户
在root权限下首先新建用户,建议用adduser命令
adduser hadoop
passwd hadoop
输入密码(hadoop)后一直按回车即可,最后输入y确定。
将hadoop用户加入到hadoop用户组
创建hadoop用户的同时也创建了hadoop用户组,下面我们把hadoop用户加入到hadoop用户组
usermod -a -G hadoop hadoop
把hadoop用户赋予root权限,让他可以使用sudo命令
chmod u+w /etc/sudoers # 修改sudoers文件的权限
vim /etc/sudoers
在root????????ALL=(ALL) ALL下面添加:
hadoop ALL=(root) NOPASSWD:ALL
chmod u-w /etc/sudoers
将/opt的目录权限给hadoop用户
sudo chmod 777 -R /opt
在所有其他节点上同样按上述方法,创建hadoop用户
hadoop用户配置免密登录
切换用户
su hadoop
生成自己的公钥和私钥
[hadoop@master ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:VoONq5DHFgXgTWnSacdFs0LOUCjFZOa7QMsbO8x018I hadoop@master
The key's randomart image is:
+---[RSA 2048]----+
| .*OB+o+ |
| .o*O*o+ o |
| o=+.* = |
| o + + = . |
| O * E . |
| + X = . |
| * o |
| . |
| |
+----[SHA256]-----+
[hadoop@master ~]$ cd .ssh/
[hadoop@master .ssh]$ ll
total 8
-rw-------. 1 hadoop hadoop 1679 Aug 30 15:02 id_rsa
-rw-r--r--. 1 hadoop hadoop 395 Aug 30 15:02 id_rsa.pub
[hadoop@master .ssh]$ cat id_rsa.pub >> authorized_keys
[hadoop@master .ssh]$ ll
total 12
-rw-rw-r--. 1 hadoop hadoop 395 Aug 30 15:03 authorized_keys
-rw-------. 1 hadoop hadoop 1679 Aug 30 15:02 id_rsa
-rw-r--r--. 1 hadoop hadoop 395 Aug 30 15:02 id_rsa.pub
[hadoop@master .ssh]$ chmod 600 authorized_keys
分发公钥文件至master2、slave1、slave2、slave3
[hadoop@master ~]$ ssh-copy-id -i slave1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave1 (192.168.21.132)' can't be established.
ECDSA key fingerprint is SHA256:OaHEb/egB7jEM+K2ADJ6oDV+bWGbARUnwj9Pmey5+Tw.
ECDSA key fingerprint is MD5:bb:4d:8e:49:44:34:5a:5b:85:44:aa:7b:25:a8:d6:9e.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave1's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'slave1'"
and check to make sure that only the key(s) you wanted were added.
在master2、slave1、slave2、slave3上分别执行以上操作,并分发给其他四台服务器。
目录规划
# Hadoop
opt
|__hadoop-3.2.0
|__hdfs
| |__name
| |__data
|__tmp
|__pids
|__logs
|__journal
下载、解压
将下载的安装包上传到/opt/install目录,解压并移动
tar -zxvf hadoop-3.2.0.tar.gz
mv hadoop-3.2.0 ../
配置环境变量
vim ~/.bashrc
# hadoop
export HADOOP_HOME=/opt/hadoop-3.2.0
export PATH=$HADOOP_HOME/bin:$PATH
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
修改完后,保存退出,执行如下命令,使更改生效
source ~/.bashrc
?分发环境变量
scp .bashrc hadoop@slave1:~/
在master2、slave1、slave2、slave3上分别执行如下命令,使更改生效
source ~/.bashrc
配置hadoop
配置hadoop运行环境
配置Hadoop JDK路径,定义集群操作用户,在hadoop-env.sh文件中添加如下内容
# java
export JAVA_HOME=/usr/java/jdk1.8.0_281-amd64
# hdfs
export HDFS_NAMENODE_USER=hadoop
export HDFS_DATANODE_USER=hadoop
export HDFS_JOURNALNODE_USER=hadoop
export HDFS_ZKFC_USER=hadoop
# yarn
export YARN_RESOURCEMANAGER_USER=hadoop
export YARN_NODEMANAGER_USER=hadoop
# hadoop
export HADOOP_PID_DIR=/data/hadoop/pids
export HADOOP_LOG_DIR=/data/hadoop/logs
配置core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop-3.2.0/tmp</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
<description>The user name to filter as, on static web filters while rendering content. </description>
</property>
<property>
<name>hadoop.zk.address</name>
<value>master:2181,slave1:2181,slave2:2181</value>
<description>Host:Port of the ZooKeeper server to be used.</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>master:2181,slave1:2181,slave2:2181</value>
<description>A list of ZooKeeper server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover.</description>
</property>
</configuration>
配置hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop-3.2.0/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop-3.2.0/hdfs/data</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>ns</value>
<description>Comma-separated list of nameservices</description>
</property>
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
<description>a comma-separated list of namenodes for nameservice ns1</description>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>master:8020</value>
<description>The RPC address for namenode nn1</description>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>master:50070</value>
<description>The address and the base port where the dfs namenode nn1 web ui will listen on</description>
</property>
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>master2:8020</value>
<description>The RPC address for namenode nn2</description>
</property>
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>master2:50070</value>
<description>The address and the base port where the dfs namenode nn2 web ui will listen on</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slave1:8485;slave2:8485/ns</value>
<description>A directory on shared storage between the multiple namenodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop-3.2.0/journal</value>
<description>The directory where the journal edit files are stored</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>Whether automatic failover is enabled</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>The prefix (plus a required nameservice ID) for the class name of the configured Failover proxy provider for the host</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
<description>A list of scripts or Java classes which will be used to fence the Active NameNode during a failover.</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<!--这里设置Hadoop允许打开最大文件数,默认4096,不设置的话会提示xcievers exceeded错误-->
<name>dfs.datanode.max.transfer.threads</name>
<value>409600</value>
</property>
</configuration>
配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master2:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master2:19888</value>
</property>
</configuration>
配置yarn-site.xml
<configuration>
<!-- Configuring the External Shuffle Service -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
<description>Enable RM high-availability</description>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>rmcluster</value>
<description>Name of the cluster</description>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
<description>The list of RM nodes in the cluster when HA is enabled</description>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master</value>
<description>The hostname of the rm1</description>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
<description>The hostname of the rm2</description>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
<description>Enable RM to recover state after starting</description>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
<description>The class to use as the persistent store</description>
</property>
<!-- YARN-Fair Scheduler. Start -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/opt/hadoop-3.2.0/hadoop/etc/hadoop/fair-scheduler.xml</value>
</property>
<property>
<name>yarn.scheduler.fair.preemption</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.fair.user-as-default-queue</name>
<value>false</value>
<description>default is True</description>
</property>
<property>
<name>yarn.scheduler.fair.allow-undeclared-pools</name>
<value>false</value>
<description>default is True</description>
</property>
<!-- YARN-Fair Scheduler. End -->
<!-- YARN nodemanager resource config. Start -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>3</value>
</property>
<!-- YARN nodemanager resource config. End -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master2:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/opt/hadoop-3.2.0/etc/hadoop:/opt/hadoop-3.2.0/share/hadoop/common/lib/*:/opt/hadoop-3.2.0/share/hadoop/common/*:/opt/hadoop-3.2.0/share/hadoop/hdfs:/opt/hadoop-3.2.0/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.0/share/hadoop/hdfs/*:/opt/hadoop-3.2.0/share/hadoop/mapreduce/*:/opt/hadoop-3.2.0/share/hadoop/yarn:/opt/hadoop-3.2.0/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.0/share/hadoop/yarn/*</value>
</property>
<!-- 客户端通过该地址向RM提交对应用程序操作 -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>master2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>master2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>master2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>master2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>master2:8033</value>
</property>
</configuration>
配置workers
vim workers
slave1
slave2
slave3
配置fair-scheduler
在/opt/hadoop-3.2.0/etc/hadoop目录下新建fair-scheduler.xml文件,新增如下内容
<?xml version="1.0"?>
<allocations>
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
<queue name="prod">
<weight>40</weight>
</queue>
<queue name="dev">
<weight>60</weight>
</queue>
<queuePlacementPolicy>
<rule name="specified" create="false" />
<rule name="primaryGroup" create="false" />
<rule name="default" queue="dev" />
</queuePlacementPolicy>
</allocations>
复制Hadoop配置好的包到其他4台服务器
scp -r /opt/hadoop-3.2.0/ hadoop@master2:/opt/
scp -r /opt/hadoop-3.2.0/ hadoop@slave1:/opt/
scp -r /opt/hadoop-3.2.0/ hadoop@slave2:/opt/
scp -r /opt/hadoop-3.2.0/ hadoop@slave3:/opt/
启动
启动zookeeper集群
分别在master、slave1、slave2上启动zk
/opt/zookeeper-3.5.9/bin/zkServer.sh start
查看状态:一个leader,两个follower
/opt/zookeeper-3.5.9/bin/zkServer.sh status
启动journalnode
分别在master、slave1、slave2上启动journalnode 注:journalnode为qjournal分布式应用,用来管理edit.log文件,依赖于zk管理,所以将三个node节点放到zk上启动。
hdfs --daemon start journalnode
运行jps命令检验,master、slave1、slave2上多了JournalNode进程
格式化HDFS(仅第一次启动执行)
在master上执行命令
[hadoop@master hadoop]$ hdfs namenode -format
2021-08-30 19:15:06,954 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.21.131
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.0
STARTUP_MSG: classpath = 。。。。。。。。。。【太多了,省略】
STARTUP_MSG: build = https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 2019-01-08T06:08Z
STARTUP_MSG: java = 1.8.0_281
************************************************************/
2021-08-30 19:15:06,962 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-08-30 19:15:07,018 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-f9d1ef0f-fad2-4666-9c58-b5b33857ac27
2021-08-30 19:15:07,396 INFO namenode.FSEditLog: Edit logging is async:true
2021-08-30 19:15:07,405 INFO namenode.FSNamesystem: KeyProvider: null
2021-08-30 19:15:07,406 INFO namenode.FSNamesystem: fsLock is fair: true
2021-08-30 19:15:07,406 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2021-08-30 19:15:07,409 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
2021-08-30 19:15:07,412 INFO namenode.FSNamesystem: supergroup = supergroup
2021-08-30 19:15:07,412 INFO namenode.FSNamesystem: isPermissionEnabled = true
2021-08-30 19:15:07,412 INFO namenode.FSNamesystem: Determined nameservice ID: ns
2021-08-30 19:15:07,412 INFO namenode.FSNamesystem: HA Enabled: true
2021-08-30 19:15:07,445 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2021-08-30 19:15:07,452 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2021-08-30 19:15:07,452 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2021-08-30 19:15:07,455 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2021-08-30 19:15:07,455 INFO blockmanagement.BlockManager: The block deletion will start around 2021 Aug 30 19:15:07
2021-08-30 19:15:07,456 INFO util.GSet: Computing capacity for map BlocksMap
2021-08-30 19:15:07,456 INFO util.GSet: VM type = 64-bit
2021-08-30 19:15:07,457 INFO util.GSet: 2.0% max memory 839.5 MB = 16.8 MB
2021-08-30 19:15:07,457 INFO util.GSet: capacity = 2^21 = 2097152 entries
2021-08-30 19:15:07,462 INFO blockmanagement.BlockManager: Storage policy satisfier is disabled
2021-08-30 19:15:07,462 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2021-08-30 19:15:07,465 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
2021-08-30 19:15:07,465 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManager: defaultReplication = 3
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManager: maxReplication = 512
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManager: minReplication = 1
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManager: redundancyRecheckInterval = 3000ms
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManager: encryptDataTransfer = false
2021-08-30 19:15:07,466 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
2021-08-30 19:15:07,481 INFO namenode.FSDirectory: GLOBAL serial map: bits=29 maxEntries=536870911
2021-08-30 19:15:07,481 INFO namenode.FSDirectory: USER serial map: bits=24 maxEntries=16777215
2021-08-30 19:15:07,481 INFO namenode.FSDirectory: GROUP serial map: bits=24 maxEntries=16777215
2021-08-30 19:15:07,481 INFO namenode.FSDirectory: XATTR serial map: bits=24 maxEntries=16777215
2021-08-30 19:15:07,486 INFO util.GSet: Computing capacity for map INodeMap
2021-08-30 19:15:07,486 INFO util.GSet: VM type = 64-bit
2021-08-30 19:15:07,486 INFO util.GSet: 1.0% max memory 839.5 MB = 8.4 MB
2021-08-30 19:15:07,486 INFO util.GSet: capacity = 2^20 = 1048576 entries
2021-08-30 19:15:07,487 INFO namenode.FSDirectory: ACLs enabled? false
2021-08-30 19:15:07,487 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2021-08-30 19:15:07,487 INFO namenode.FSDirectory: XAttrs enabled? true
2021-08-30 19:15:07,487 INFO namenode.NameNode: Caching file names occurring more than 10 times
2021-08-30 19:15:07,490 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2021-08-30 19:15:07,491 INFO snapshot.SnapshotManager: SkipList is disabled
2021-08-30 19:15:07,494 INFO util.GSet: Computing capacity for map cachedBlocks
2021-08-30 19:15:07,494 INFO util.GSet: VM type = 64-bit
2021-08-30 19:15:07,494 INFO util.GSet: 0.25% max memory 839.5 MB = 2.1 MB
2021-08-30 19:15:07,494 INFO util.GSet: capacity = 2^18 = 262144 entries
2021-08-30 19:15:07,498 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2021-08-30 19:15:07,498 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2021-08-30 19:15:07,498 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2021-08-30 19:15:07,500 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2021-08-30 19:15:07,500 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2021-08-30 19:15:07,501 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2021-08-30 19:15:07,501 INFO util.GSet: VM type = 64-bit
2021-08-30 19:15:07,501 INFO util.GSet: 0.029999999329447746% max memory 839.5 MB = 257.9 KB
2021-08-30 19:15:07,501 INFO util.GSet: capacity = 2^15 = 32768 entries
2021-08-30 19:15:08,056 INFO namenode.FSImage: Allocated new BlockPoolId: BP-819612954-192.168.21.131-1630322108055
2021-08-30 19:15:08,064 INFO common.Storage: Storage directory /opt/hadoop-3.2.0/hdfs/name has been successfully formatted.
2021-08-30 19:15:08,141 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop-3.2.0/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2021-08-30 19:15:08,236 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop-3.2.0/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2021-08-30 19:15:08,240 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-08-30 19:15:08,254 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.21.131
************************************************************/
在master上启动namenode
hdfs --daemon start namenode
在master2上同步master namenode元数据
[hadoop@master2 root]$ hdfs namenode -bootstrapStandby
2021-08-30 19:15:37,598 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master2/192.168.21.135
STARTUP_MSG: args = [-bootstrapStandby]
STARTUP_MSG: version = 3.2.0
STARTUP_MSG: classpath = 。。。。。。。【太多了,省略】
STARTUP_MSG: build = https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf; compiled by 'sunilg' on 2019-01-08T06:08Z
STARTUP_MSG: java = 1.8.0_281
************************************************************/
2021-08-30 19:15:37,605 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-08-30 19:15:37,656 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
2021-08-30 19:15:37,738 INFO ha.BootstrapStandby: Found nn: nn1, ipc: master/192.168.21.131:8020
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: ns
Other Namenode ID: nn1
Other NN's HTTP address: http://master:50070
Other NN's IPC address: master/192.168.21.131:8020
Namespace ID: 1565158353
Block pool ID: BP-819612954-192.168.21.131-1630322108055
Cluster ID: CID-f9d1ef0f-fad2-4666-9c58-b5b33857ac27
Layout version: -65
isUpgradeFinalized: true
=====================================================
2021-08-30 19:15:38,239 INFO common.Storage: Storage directory /opt/hadoop-3.2.0/hdfs/name has been successfully formatted.
2021-08-30 19:15:38,265 INFO namenode.FSEditLog: Edit logging is async:true
2021-08-30 19:15:38,320 INFO namenode.TransferFsImage: Opening connection to http://master:50070/imagetransfer?getimage=1&txid=0&storageInfo=-65:1565158353:1630322108055:CID-f9d1ef0f-fad2-4666-9c58-b5b33857ac27&bootstrapstandby=true
2021-08-30 19:15:38,413 INFO common.Util: Combined time for file download and fsync to all disks took 0.00s. The file download took 0.00s at 0.00 KB/s. Synchronous (fsync) write to disk of /opt/hadoop-3.2.0/hdfs/name/current/fsimage.ckpt_0000000000000000000 took 0.00s.
2021-08-30 19:15:38,413 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 401 bytes.
2021-08-30 19:15:38,434 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master2/192.168.21.135
************************************************************/
在master上关闭namenode
hdfs --daemon stop namenode
格式化zkfc(仅第一次启动执行)
在master上执行即可 注:zkfc是用来管理两台namenode切换状态的进程。同样是依赖zk实现。当active namenode状态不正常了,该namenode上的zkfc会将这个状态发动到 zk上,standby namenode上的zkfc会查看到该不正常状态,并向active namenode通过ssh发送一条指令,kill -9 进程号,杀死该进程,并将自己重置成active,防止active假死发生脑裂事件,万一ssh发送失败,也可以启动自定义的.sh脚本文件,强制杀死active namenode进程。
在hadoop3.x中将这样的一对namenode管理关系叫做 federation(联邦)。
并且支持多个federation, 比如配置文件中起名为ns1, 则该ns中包括 (active namenode)nn1, (standby namenode)nn2 。
hdfs zkfc -formatZK
启动hadoop集群
启动HDFS
分别在master、slave1、slave2上先关闭journalnode
hdfs --daemon stop journalnode
然后在master上执行
[hadoop@master sbin]$ /opt/hadoop-3.2.0/sbin/start-dfs.sh
Starting namenodes on [master master2]
Starting datanodes
Starting journal nodes [slave2 slave1 master]
Starting ZK Failover Controllers on NN hosts [master master2]
可以看到DFSZKFailoverController分别在master、master2上启动起来了。
启动YARN 在master上执行
[hadoop@master sbin]$ /opt/hadoop-3.2.0/sbin/start-yarn.sh
Starting resourcemanagers on [ master master2]
Starting nodemanagers
启动MR的historyserver 在master2上启动MR的historyserver
mapred --daemon start historyserver
验证
web访问
HDFS http://master:50070 http://master2:50070 其中一个是active,一个是standby YARN http://master:8088 http://master2:8088 在浏览的时候standby会重定向跳转到active对应的页面
HA的验证
namenode HA 访问 http://master:50070 http://master2:50070 其中一个是active,一个是standby 主备切换验证 在master上kill -9 namenode的进程,这时候 YARN HA 主备切换验证 在master上kill -9 resourcemanager的进程 这时可以访问http://master2:8088 然后在master上重新启动resourcemanager(yarn --daemon start resourcemanager),再访问http://master:8088时就会自动跳转到http://master2:8088
注意
使用HA的时候,不能启动SecondaryNameNode,会出错
参考文章
CSDN博主「逝水-无痕」的原创文章,基于CentOS7的Hadoop3.x完全分布式HA的安装部署。 原文链接:https://blog.csdn.net/wangkai_123456/article/details/90599771
|