apache大数据数仓各组件部署搭建
第一章 环境准备
1. 机器规划
准备3台服务器用于集群部署,系统建议CentOS7+,2核8G内存
172.19.195.228 hadoop101 172.19.195.229 hadoop102 172.19.195.230 hadoop103
[root@hadoop101 ~]
CentOS Linux release 7.5.1804 (Core)
[root@hadoop101 ~]
hadoop101
2. 安装包下载准备
数仓部署组件安装包: 链接:https://pan.baidu.com/s/1Wjx6TNkedMTmmnuWREW-OQ 提取码:bpk0
已经把相关组件均上传至网盘,也可自行去各自官方地址去下载收集;
3. 配置服务器hosts
3台机器的/etc/hosts主机名解析配置:
[root@hadoop101 ~]
127.0.0.1 localhost localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.19.195.228 hadoop101
172.19.195.229 hadoop102
172.19.195.230 hadoop103
[root@hadoop102 ~]
127.0.0.1 localhost localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.19.195.228 hadoop101
172.19.195.229 hadoop102
172.19.195.230 hadoop103
[root@hadoop103 ~]
127.0.0.1 localhost localhost
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.19.195.228 hadoop101
172.19.195.229 hadoop102
172.19.195.230 hadoop103
4. 配置服务器之间免密登录
[root@hadoop101 ~]
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:pgAtkJ9Tmf8sqBYOkK2gr/d7woIPXDguOiHRxRHDVH4 root@hadoop101
The key's randomart image is:
+---[RSA 2048]----+
|.. +=*. |
|.. .Bo |
| =o+... E |
|= Bo .. |
|+= o.. oS |
|O + ...oo |
|+O + .. |
|=.B o . |
|o*.oo+ |
+----[SHA256]-----+
# 分发密钥
[root@hadoop101 ~]# ssh-copy-id hadoop101
Are you sure you want to continue connecting (yes/no)? yes
root@hadoop101's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop101'"
and check to make sure that only the key(s) you wanted were added.
[root@hadoop101 ~]
Are you sure you want to continue connecting (yes/no)? yes
root@hadoop102's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop102'"
and check to make sure that only the key(s) you wanted were added.
[root@hadoop101 ~]# ssh-copy-id hadoop103
Are you sure you want to continue connecting (yes/no)? yes
root@hadoop103's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop103'"
and check to make sure that only the key(s) you wanted were added.
[root@hadoop102 ~]
[root@hadoop102 ~]
[root@hadoop102 ~]
[root@hadoop102 ~]
[root@hadoop103 ~]
[root@hadoop103 ~]
[root@hadoop103 ~]
[root@hadoop103 ~]
[root@hadoop101 ~]
hadoop101
hadoop102
hadoop103
5. 创建安装包和应用目录
[root@hadoop101 ~]
6. 上传安装包至服务器
[root@hadoop101 ~]
[root@hadoop101 software]
total 1489048
-rw-r--r-- 1 root root 278813748 Aug 26 15:13 apache-hive-3.1.2-bin.tar.gz
-rw-r--r-- 1 root root 9136463 Aug 26 15:13 apache-maven-3.6.1-bin.tar.gz
-rw-r--r-- 1 root root 9311744 Aug 26 15:13 apache-zookeeper-3.5.7-bin.tar.gz
-rw-r--r-- 1 root root 338075860 Aug 26 15:14 hadoop-3.1.3.tar.gz
-rw-r--r-- 1 root root 314030393 Aug 26 15:14 hue.tar
-rw-r--r-- 1 root root 194990602 Aug 26 15:15 jdk-8u211-linux-x64.tar.gz
-rw-r--r-- 1 root root 70057083 Aug 26 15:14 kafka_2.11-2.4.0.tgz
-rw-r--r-- 1 root root 77807942 Aug 26 15:15 mysql-libs.zip
-rw-r--r-- 1 root root 232530699 Aug 26 15:15 spark-2.4.5-bin-hadoop2.7.tgz
7. 各服务器关闭防火墙
[root@hadoop101 software]
[root@hadoop101 software]
[root@hadoop102 ~]
[root@hadoop102 ~]
[root@hadoop103 ~]
[root@hadoop103 ~]
8. 本地Windows电脑配置hosts( 可选 )
【注意】如果不配置则涉及URL访问时,浏览器访问时需要使用各服务器ip地址
C:\Windows\System32\drivers\etc\hosts
# hadoop集群
139.224.229.107 hadoop101
139.224.66.13 hadoop102
139.224.228.144 hadoop103
9. 安装java ( jdk1.8 )
[root@hadoop101 software]
[root@hadoop101 software]
jdk1.8.0_211
[root@hadoop101 software]
export JAVA_HOME=/opt/module/jdk1.8.0_211
export PATH=$PATH:$JAVA_HOME/bin
[root@hadoop101 software]
[root@hadoop101 software]
[root@hadoop101 software]
[root@hadoop101 software]
[root@hadoop101 software]
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
[root@hadoop102 ~]
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
[root@hadoop103 ~]
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
第二章 zookeeper安装部署
zookeeper完整详细的内容介绍可见zookeeper简介、zookeeper部署以及原理介绍
1. 解压zookeeper安装包
[root@hadoop101 software]
2. 创建zkData目录
[root@hadoop101 software]
3. 设定节点myid号
[root@hadoop101 software]
4. 修改zoo.cfg配置文件
[root@hadoop101 software]
[root@hadoop101 conf]
total 12
-rw-r--r-- 1 502 games 535 May 4 2018 configuration.xsl
-rw-r--r-- 1 502 games 2712 Feb 7 2020 log4j.properties
-rw-r--r-- 1 502 games 922 Feb 7 2020 zoo_sample.cfg
[root@hadoop101 conf]
[root@hadoop101 conf]
[root@hadoop101 conf]
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/module/apache-zookeeper-3.5.7-bin/zkData
clientPort=2181
server.1=hadoop101:2888:3888
server.2=hadoop102:2888:3888
server.3=hadoop103:2888:3888
[root@hadoop101 conf]
5. 分发应用目录
[root@hadoop101 module]
[root@hadoop101 module]
6. 更改其它节点myid号
[root@hadoop102 ~]
[root@hadoop103 ~]
7. 检查各节点myid号确保正确
[root@hadoop101 module]
1
2
3
8. 启动集群各节点服务
[root@hadoop101 module]
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@hadoop102 module]
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@hadoop103 ~]
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
9. 查验服务是否运行
[root@hadoop101 module]
5856 org.apache.zookeeper.server.quorum.QuorumPeerMain
5747 org.apache.zookeeper.server.quorum.QuorumPeerMain
5754 org.apache.zookeeper.server.quorum.QuorumPeerMain
第三章 hadoop集群安装部署
hadoop完整详细的内容介绍可见hadoop介绍部署文档
1. 解压hadoop安装包
[root@hadoop101 software]
2. 配置hadoop环境变量文件
[root@hadoop101 software]
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3. 各配置文件路径
[root@hadoop101 software]
[root@hadoop101 hadoop]
total 176
-rw-r--r-- 1 1000 1000 8260 Sep 12 2019 capacity-scheduler.xml
-rw-r--r-- 1 1000 1000 1335 Sep 12 2019 configuration.xsl
-rw-r--r-- 1 1000 1000 1940 Sep 12 2019 container-executor.cfg
-rw-r--r-- 1 1000 1000 1353 Aug 26 16:29 core-site.xml
-rw-r--r-- 1 1000 1000 3999 Sep 12 2019 hadoop-env.cmd
-rw-r--r-- 1 1000 1000 15946 Aug 26 16:42 hadoop-env.sh
-rw-r--r-- 1 1000 1000 3323 Sep 12 2019 hadoop-metrics2.properties
-rw-r--r-- 1 1000 1000 11392 Sep 12 2019 hadoop-policy.xml
-rw-r--r-- 1 1000 1000 3414 Sep 12 2019 hadoop-user-functions.sh.example
-rw-r--r-- 1 1000 1000 2956 Aug 26 16:28 hdfs-site.xml
-rw-r--r-- 1 1000 1000 1484 Sep 12 2019 httpfs-env.sh
-rw-r--r-- 1 1000 1000 1657 Sep 12 2019 httpfs-log4j.properties
-rw-r--r-- 1 1000 1000 21 Sep 12 2019 httpfs-signature.secret
-rw-r--r-- 1 1000 1000 620 Sep 12 2019 httpfs-site.xml
-rw-r--r-- 1 1000 1000 3518 Sep 12 2019 kms-acls.xml
-rw-r--r-- 1 1000 1000 1351 Sep 12 2019 kms-env.sh
-rw-r--r-- 1 1000 1000 1747 Sep 12 2019 kms-log4j.properties
-rw-r--r-- 1 1000 1000 682 Sep 12 2019 kms-site.xml
-rw-r--r-- 1 1000 1000 13326 Sep 12 2019 log4j.properties
-rw-r--r-- 1 1000 1000 951 Sep 12 2019 mapred-env.cmd
-rw-r--r-- 1 1000 1000 1764 Sep 12 2019 mapred-env.sh
-rw-r--r-- 1 1000 1000 4113 Sep 12 2019 mapred-queues.xml.template
-rw-r--r-- 1 1000 1000 758 Sep 12 2019 mapred-site.xml
drwxr-xr-x 2 1000 1000 4096 Sep 12 2019 shellprofile.d
-rw-r--r-- 1 1000 1000 2316 Sep 12 2019 ssl-client.xml.example
-rw-r--r-- 1 1000 1000 2697 Sep 12 2019 ssl-server.xml.example
-rw-r--r-- 1 1000 1000 2642 Sep 12 2019 user_ec_policies.xml.template
-rw-r--r-- 1 1000 1000 30 Aug 26 16:33 workers
-rw-r--r-- 1 1000 1000 2250 Sep 12 2019 yarn-env.cmd
-rw-r--r-- 1 1000 1000 6056 Sep 12 2019 yarn-env.sh
-rw-r--r-- 1 1000 1000 2591 Sep 12 2019 yarnservice-log4j.properties
-rw-r--r-- 1 1000 1000 2029 Aug 26 16:32 yarn-site.xml
4. 修改hdfs-site配置文件
[root@hadoop101 hadoop]# vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2,nn3</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop101:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop102:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn3</name>
<value>hadoop103:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop101:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop102:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn3</name>
<value>hadoop103:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop101:8485;hadoop102:8485;hadoop103:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
5. 修改core-site配置文件
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/module/hadoop-3.1.3/JN/data</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop101:2181,hadoop102:2181,hadoop103:2181</value>
</property>
</configuration>
6. 修改yarn-site配置文件
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop101</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop103</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop101:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop103:8088</value>
</property>
<property>
<name>hadoop.zk.address</name>
<value>hadoop101:2181,hadoop102:2181,hadoop103:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>
7. 修改workers定义集群各节点
[root@hadoop101 hadoop]
hadoop101
hadoop102
hadoop103
8. 修改hadoop-env.sh定义java环境变量
[root@hadoop101 hadoop]
export JAVA_HOME=/opt/module/jdk1.8.0_211
9. 各启动脚本路径
[root@hadoop101 hadoop]
[root@hadoop101 sbin]
total 112
-rwxr-xr-x 1 1000 1000 2756 Sep 12 2019 distribute-exclude.sh
drwxr-xr-x 4 1000 1000 4096 Sep 12 2019 FederationStateStore
-rwxr-xr-x 1 1000 1000 1983 Sep 12 2019 hadoop-daemon.sh
-rwxr-xr-x 1 1000 1000 2522 Sep 12 2019 hadoop-daemons.sh
-rwxr-xr-x 1 1000 1000 1542 Sep 12 2019 httpfs.sh
-rwxr-xr-x 1 1000 1000 1500 Sep 12 2019 kms.sh
-rwxr-xr-x 1 1000 1000 1841 Sep 12 2019 mr-jobhistory-daemon.sh
-rwxr-xr-x 1 1000 1000 2086 Sep 12 2019 refresh-namenodes.sh
-rwxr-xr-x 1 1000 1000 1779 Sep 12 2019 start-all.cmd
-rwxr-xr-x 1 1000 1000 2221 Sep 12 2019 start-all.sh
-rwxr-xr-x 1 1000 1000 1880 Sep 12 2019 start-balancer.sh
-rwxr-xr-x 1 1000 1000 1401 Sep 12 2019 start-dfs.cmd
-rwxr-xr-x 1 1000 1000 5325 Aug 26 16:37 start-dfs.sh
-rwxr-xr-x 1 1000 1000 1793 Sep 12 2019 start-secure-dns.sh
-rwxr-xr-x 1 1000 1000 1571 Sep 12 2019 start-yarn.cmd
-rwxr-xr-x 1 1000 1000 3427 Aug 26 16:39 start-yarn.sh
-rwxr-xr-x 1 1000 1000 1770 Sep 12 2019 stop-all.cmd
-rwxr-xr-x 1 1000 1000 2166 Sep 12 2019 stop-all.sh
-rwxr-xr-x 1 1000 1000 1783 Sep 12 2019 stop-balancer.sh
-rwxr-xr-x 1 1000 1000 1455 Sep 12 2019 stop-dfs.cmd
-rwxr-xr-x 1 1000 1000 4053 Aug 26 16:38 stop-dfs.sh
-rwxr-xr-x 1 1000 1000 1756 Sep 12 2019 stop-secure-dns.sh
-rwxr-xr-x 1 1000 1000 1642 Sep 12 2019 stop-yarn.cmd
-rwxr-xr-x 1 1000 1000 3168 Aug 26 16:40 stop-yarn.sh
-rwxr-xr-x 1 1000 1000 1982 Sep 12 2019 workers.sh
-rwxr-xr-x 1 1000 1000 1814 Sep 12 2019 yarn-daemon.sh
-rwxr-xr-x 1 1000 1000 2328 Sep 12 2019 yarn-daemons.sh
10. 修改start-dfs脚本,增添用户变量
[root@hadoop101 sbin]
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
11. 修改stop-dfs脚本,增添用户变量
[root@hadoop101 sbin]
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
12. 修改start-yarn脚本,增添用户变量
[root@hadoop101 sbin]
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
13. 修改stop-yarn脚本,增添用户变量
[root@hadoop101 sbin]
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
14. 分发安装目录
[root@hadoop101 module]
[root@hadoop101 module]
15. 分发环境变量文件profile
[root@hadoop101 module]
[root@hadoop101 module]
[root@hadoop101 ~]
[root@hadoop102 ~]
[root@hadoop103 ~]
16. 各节点启动journalnode服务
[root@hadoop101 module]
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
[root@hadoop102 module]
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
[root@hadoop103 ~]
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
17. 在nn1( hadoop101 )上对namenode进行格式化
[root@hadoop101 module]# hdfs namenode -format
2021-08-26 16:53:52,236 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop101/172.19.195.228
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.1.3
...
...
2021-08-26 16:53:54,483 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2021-08-26 16:53:54,484 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop101/172.19.195.228
************************************************************/
18. nn1启动namenode,nn2、nn3同步
[root@hadoop101 module]
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[root@hadoop101 module]
5856 QuorumPeerMain
9681 NameNode
8379 JournalNode
9790 Jps
[root@hadoop102 module]
[root@hadoop103 module]
[root@hadoop102 module]
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[root@hadoop103 ~]
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
19. 关闭全部hdfs服务
[root@hadoop101 module]
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Stopping namenodes on [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 15:30:42 CST 2021 from 172.19.195.228 on pts/1
Stopping datanodes
Last login: Thu Aug 26 17:19:24 CST 2021 on pts/0
Stopping journal nodes [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:19:25 CST 2021 on pts/0
Stopping ZK Failover Controllers on NN hosts [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:19:27 CST 2021 on pts/0
Stopping nodemanagers
Last login: Thu Aug 26 17:19:30 CST 2021 on pts/0
Stopping resourcemanagers on [ hadoop101 hadoop103]
Last login: Thu Aug 26 17:19:30 CST 2021 on pts/0
[root@hadoop101 module]
20. 初始化HA在Zookeeper中状态
[root@hadoop101 module]
2021-08-26 17:20:32,622 INFO tools.DFSZKFailoverController: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DFSZKFailoverController
STARTUP_MSG: host = hadoop101/172.19.195.228
STARTUP_MSG: args = [-formatZK]
STARTUP_MSG: version = 3.1.3
...
...
2021-08-26 17:20:33,263 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x100004943610000
2021-08-26 17:20:33,265 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at hadoop101/172.19.195.228
************************************************************/
21. 启动全部hdfs服务
[root@hadoop101 module]
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:19:34 CST 2021 on pts/0
Starting datanodes
Last login: Thu Aug 26 17:21:17 CST 2021 on pts/0
Starting journal nodes [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:21:20 CST 2021 on pts/0
Starting ZK Failover Controllers on NN hosts [hadoop101 hadoop102 hadoop103]
Last login: Thu Aug 26 17:21:25 CST 2021 on pts/0
Starting resourcemanagers on [ hadoop101 hadoop103]
Last login: Thu Aug 26 17:21:29 CST 2021 on pts/0
Starting nodemanagers
Last login: Thu Aug 26 17:21:36 CST 2021 on pts/0
22. 页面URL验证
管理首页:
http://hadoop101:9870 http://hadoop102:9870 http://hadoop103:9870
节点任务页:8088端口
第四章 MySQL安装部署
【注意】MySQL是公用中间件,1台服务器部署即可
1. 卸载mysql-libs
lib库会涉及有mariadb-libs
[root@hadoop101 hadoop]
2. 安装mysql公用依赖
[root@hadoop101 hadoop]
3. 官方下载对应版本依赖安装
[root@hadoop101 hadoop]
[root@hadoop101 software]
[root@hadoop101 software]
[root@hadoop101 software]
[root@hadoop101 software]
4. 解压mysql安装包
[root@hadoop101 software]
[root@hadoop101 software]
Archive: mysql-libs.zip
creating: mysql-libs/
inflating: mysql-libs/MySQL-client-5.6.24-1.el6.x86_64.rpm
inflating: mysql-libs/mysql-connector-java-5.1.27.tar.gz
inflating: mysql-libs/MySQL-server-5.6.24-1.el6.x86_64.rpm
[root@hadoop101 software]
[root@hadoop101 software]
[root@hadoop101 mysql-libs]
MySQL-client-5.6.24-1.el6.x86_64.rpm mysql-connector-java-5.1.27.tar.gz MySQL-server-5.6.24-1.el6.x86_64.rpm
5. 安装mysql服务端server
[root@hadoop101 mysql-libs]
[root@hadoop101 mysql-libs]
[root@hadoop101 mysql-libs]
[root@hadoop101 mysql-libs]
Starting MySQL. SUCCESS!
[root@hadoop101 mysql-libs]
SUCCESS! MySQL running (7950)
[root@hadoop101 mysql-libs]
mysqld Ver 5.6.24 for Linux on x86_64 (MySQL Community Server (GPL))
6. 安装mysql客户端client
[root@hadoop101 mysql-libs]
[root@hadoop101 mysql-libs]
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.24
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SET PASSWORD=PASSWORD('123456');
Query OK, 0 rows affected (0.00 sec)
mysql> exit;
Bye
7. 修改mysql用户权限和连接策略
[root@hadoop101 mysql-libs]
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.24 MySQL Community Server (GPL)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.00 sec)
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select User, Host, Password from user;
+------+-----------+-------------------------------------------+
| User | Host | Password |
+------+-----------+-------------------------------------------+
| root | localhost | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| root | hadoop101 | *E288B1DC67ADA34893E6D82AFAAFA408E6BB29D4 |
| root | 127.0.0.1 | *E288B1DC67ADA34893E6D82AFAAFA408E6BB29D4 |
| root | ::1 | *E288B1DC67ADA34893E6D82AFAAFA408E6BB29D4 |
+------+-----------+-------------------------------------------+
4 rows in set (0.00 sec)
mysql> update user set host='%' where host='localhost';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> delete from user where host!='%';
Query OK, 3 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> select User, Host, Password from user;
+------+------+-------------------------------------------+
| User | Host | Password |
+------+------+-------------------------------------------+
| root | % | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+------+------+-------------------------------------------+
1 row in set (0.00 sec)
mysql> exit;
Bye
第五章 hive安装部署
hive完整详细的内容介绍可见Hive简介及hive部署、原理和使用介绍
1. 安装hive包
[root@hadoop101 mysql-libs]
[root@hadoop101 software]
2. 拷贝mysql-connector驱动到hive库
[root@hadoop101 software]
[root@hadoop101 mysql-libs]
MySQL-client-5.6.24-1.el6.x86_64.rpm mysql-connector-java-5.1.27.tar.gz MySQL-server-5.6.24-1.el6.x86_64.rpm
[root@hadoop101 mysql-libs]
[root@hadoop101 mysql-libs]
[root@hadoop101 mysql-connector-java-5.1.27]
build.xml CHANGES COPYING docs mysql-connector-java-5.1.27-bin.jar README README.txt src
[root@hadoop101 mysql-connector-java-5.1.27]
3. hive配置文件定义
# 最重要目的为配置hive元数据到MySql
[root@hadoop101 mysql-connector-java-5.1.27]# cd /opt/module/apache-hive-3.1.2-bin/conf/
[root@hadoop101 conf]# vim hive-site.xml
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop101:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop101:9083</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop101</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.server2.active.passive.ha.enable</name>
<value>true</value>
</property>
</configuration>
4. 配置hive环境变量
[root@hadoop101 conf]
export HIVE_HOME=/opt/module/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin
[root@hadoop101 conf]
5. 替换hive中guava.jar包
[root@hadoop101 conf]
[root@hadoop101 lib]
guava-19.0.jar jersey-guava-2.25.1.jar
[root@hadoop101 lib]
[root@hadoop101 lib]
guava-27.0-jre.jar listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
[root@hadoop101 lib]
[root@hadoop101 lib]
[root@hadoop101 lib]
-rw-r--r-- 1 root root 2308517 Sep 27 2018 guava-19.0.jar
-rw-r--r-- 1 root root 2747878 Aug 27 10:21 guava-27.0-jre.jar
-rw-r--r-- 1 root root 971309 May 21 2019 jersey-guava-2.25.1.jar
[root@hadoop101 lib]
6. 启动元数据服务并在后台运行服务
[root@hadoop101 apache-hive-3.1.2-bin]
[1] 10656
[root@hadoop101 apache-hive-3.1.2-bin]
[2] 10830
【注意】:hive 2.x版本以上的高版本需要启动两个服务metastore和hiveserver2
7. 启动hive验证hive命令行
[root@hadoop101 apache-hive-3.1.2-bin]
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/module/jdk1.8.0_211/bin:/opt/module/jdk1.8.0_211/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/jdk1.8.0_211/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = a1dcf9b4-2d47-4a68-9505-b53267e45438
Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 5a498398-2233-402e-9720-cd066aa68add
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (default)> show tables;
OK
tab_name
Time taken: 0.725 seconds
hive (default)> create table blizzard(id int,game string);
OK
Time taken: 2.648 seconds
hive (default)> insert into blizzard values(1,'wow');
hive (default)> insert into blizzard values(2,'ow');
hive (default)> select * from blizzard;
OK
blizzard.id blizzard.game
1 wow
2 ow
Time taken: 1.209 seconds, Fetched: 2 row(s)
hive (default)>
第六章 Hue可视化工具安装部署
1. 安装包准备
[root@hadoop101 apache-hive-3.1.2-bin]
[root@hadoop101 software]
hue.tar
【注意】:
- 这里的hue.tar包是将正常运行环境配置好以后的应用打成的tar包,也就是按照服务器集群配置操作就能开箱即用,否则需要去自行编译打包进行多处的配置修改定义。
- hue可视化工具部署在主机名为 hadoop102的节点安装,因配置文件中按照hadoop102来配置的。
2. 解压安装hue包
[root@hadoop102 ~]
[root@hadoop102 software]
total 306676
-rw-r--r-- 1 root root 314030393 Aug 27 10:58 hue.tar
[root@hadoop102 software]
3. 安装各依赖和相关组件
[root@hadoop102 software]
4. 修改hadoop集群配置
[root@hadoop102 software]# cd /opt/module/hadoop-3.1.3/etc/hadoop/
【注意】:增加的配置均在标签<configuration></configuration>中
# 新增选项
[root@hadoop102 hadoop]# vim hdfs-site.xml
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
# 新增选项
[root@hadoop102 hadoop]# vim core-site.xml
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
# 新增选项
[root@hadoop102 hadoop]# vim httpfs-site.xml
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
5. 分发hadoop配置同步至hadoop101、103
[root@hadoop102 hadoop]# scp hdfs-site.xml core-site.xml httpfs-site.xml hadoop101:/opt/module/hadoop-3.1.3/etc/hadoop/
[root@hadoop102 hadoop]# scp hdfs-site.xml core-site.xml httpfs-site.xml hadoop103:/opt/module/hadoop-3.1.3/etc/hadoop/
6. hue配置文件
[root@hadoop102 hadoop]
[root@hadoop102 conf]
total 164
-rw-r--r-- 1 root root 2155 Feb 21 2020 log.conf
-rw-r--r-- 1 root root 77979 Feb 26 2020 pseudo-distributed.ini
-rw-r--r-- 1 root root 78005 Feb 21 2020 pseudo-distributed.ini.tmpl
pseudo-distributed.ini配置文件中如果是自行源码编译打包的,则需要逐个配置比对,涉及的修改项很多,例如:
hue服务和端口
# Webserver listens on this address and port
http_host=hadoop102
http_port=8000
hdfs集群信息
fs_defaultfs=hdfs://mycluster:8020
# NameNode logical name.
logical_name=mycluster
数据库信息
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306
# Username to authenticate with when connecting to the database.
user=root
...
...
...
7. 将hue应用赋予hue用户所属
[root@hadoop102 conf]
[root@hadoop102 conf]
Changing password for user hue.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@hadoop102 conf]
[root@hadoop102 conf]
[root@hadoop102 module]
total 16
drwxr-xr-x 8 root root 4096 Aug 26 16:12 apache-zookeeper-3.5.7-bin
drwxr-xr-x 11 root root 4096 Aug 26 17:15 hadoop-3.1.3
drwxr-xr-x 14 hue hue 4096 Feb 24 2020 hue-master
drwxr-xr-x 7 root root 4096 Aug 26 15:49 jdk1.8.0_211
[root@hadoop102 module]
8. 重启hdfs集群
[root@hadoop101 software]
[root@hadoop101 software]
9. mysql新建hue库
[root@hadoop101 software]
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 66
Server version: 5.6.24 MySQL Community Server (GPL)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create database hue;
Query OK, 1 row affected (0.00 sec)
10. 初始化hue数据库
[root@hadoop102 module]
[root@hadoop102 hue-master]
[root@hadoop102 hue-master]
11. 启动hue服务
[root@hadoop102 hue-master]
12. 页面访问验证
hadoop102 8000端口
默认用户名密码均为admin / admin
hive中相关操作可以在hue界面上简单使用较为方便
第七章 kafka安装部署
kafka完整详细的内容介绍可见https://blog.csdn.net/wt334502157/article/details/116518259
1. 安装kafka包
[root@hadoop101 ~]
[root@hadoop101 software]
2. 创建kafka的logs目录
[root@hadoop101 software]
[root@hadoop101 kafka_2.11-2.4.0]
3. kafka配置文件修改
[root@hadoop101 kafka_2.11-2.4.0]
[root@hadoop101 config]
broker.id=0
delete.topic.enable=true
log.dirs=/opt/module/kafka_2.11-2.4.0/logs
zookeeper.connect=hadoop101:2181,hadoop102:2181,hadoop103:2181/kafka_2.4
[root@hadoop101 config]
broker.id=0
delete.topic.enable=true
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/opt/module/kafka_2.11-2.4.0/logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=hadoop101:2181,hadoop102:2181,hadoop103:2181/kafka_2.4
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
4. 分发kafka应用目录
[root@hadoop101 config]
[root@hadoop101 module]
[root@hadoop101 module]
5. 修改其它节点配置broker.id
[root@hadoop102 ~]
[root@hadoop102 config]
broker.id=1
[root@hadoop103 ~]
[root@hadoop103 config]
broker.id=2
6. 检查zk运行状态
[root@hadoop101 module]
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
[root@hadoop102 config]
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
[root@hadoop103 config]
ZooKeeper JMX enabled by default
Using config: /opt/module/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
7. 依次启动kafka集群服务
[root@hadoop101 module]
[root@hadoop102 config]
[root@hadoop103 config]
8. 检查zookeeper中kafka注册信息
[root@hadoop101 module]
Connecting to localhost:2181
Welcome to ZooKeeper!
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, kafka_2.4, rmstore, yarn-leader-election, zookeeper]
[zk: localhost:2181(CONNECTED) 1] ls /kafka_2.4
[admin, brokers, cluster, config, consumers, controller, controller_epoch, isr_change_notification, latest_producer_id_block, log_dir_event_notification]
[zk: localhost:2181(CONNECTED) 2] quit
[root@hadoop101 module]
9. 简单使用验证
[root@hadoop101 module]
Created topic test0827.
[root@hadoop101 module]
Created topic wangt.
[root@hadoop101 module]
Created topic wow.
[root@hadoop101 module]
test0827
wangt
wow
第八章 spark安装部署
kafka完整详细的内容介绍可见https://blog.csdn.net/wt334502157/article/details/119205087
1. 安装spark包
[root@hadoop101 module]
[root@hadoop101 software]
2. 查看hadoop支持压缩清单
[root@hadoop101 conf]
2021-08-27 15:09:02,783 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
2021-08-27 15:09:02,787 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2021-08-27 15:09:02,792 WARN zstd.ZStandardCompressor: Error loading zstandard native libraries: java.lang.InternalError: Cannot load libzstd.so.1 (libzstd.so.1: cannot open shared object file: No such file or directory)!
2021-08-27 15:09:02,802 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
Native library checking:
hadoop: true /opt/module/hadoop-3.1.3/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
zstd : false
snappy: true /lib64/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!
ISA-L: false libhadoop was built without ISA-L support
3. 修改spark-env配置项
[root@hadoop101 software]
[root@hadoop101 conf]
[root@hadoop101 conf]
YARN_CONF_DIR=/opt/module/hadoop-3.1.3/etc/hadoop
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
4. 复制hive配置文件至spark
[root@hadoop101 conf]
[root@hadoop101 conf]
docker.properties.template fairscheduler.xml.template hive-site.xml log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh
5. 环境变量配置
[root@hadoop101 conf]
export SPARK_HOME=/opt/module/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@hadoop101 conf]
[root@hadoop101 conf]
6. 基于spark功能修改hadoop配置
[root@hadoop101 conf]# cd /opt/module/hadoop-3.1.3/etc/hadoop
# yarn-site新增如下配置
[root@hadoop101 hadoop]# vim yarn-site.xml
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/opt/module/hadoop-3.1.3/yarn-logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs</value>
</property>
# mapred-site新增如下配置
[root@hadoop101 hadoop]# vim mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>指定mr框架为yarn方式 </description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop102:10020</value>
<description>历史服务器端口号</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop102:19888</value>
<description>历史服务器的WEB UI端口号</description>
</property>
【注意】:历史服务相关配置都配置的为hadoop102,所以历史服务相关启动停止需要在对应的hadoop102上操作。
7. 分发配置文件
[root@hadoop101 hadoop]
[root@hadoop101 hadoop]
8. hadoop101节点上重启hadoop集群
[root@hadoop101 hadoop]
[root@hadoop101 hadoop]
[root@hadoop101 hadoop]
9. hadoop102节点启动hadoop历史服务
[root@hadoop102 config]
WARNING: Use of this script to start the MR JobHistory daemon is deprecated.
WARNING: Attempting to execute replacement "mapred --daemon start" instead.
[root@hadoop102 config]
[root@hadoop102 config]
29311 JobHistoryServer
[root@hadoop102 config]
10. 配置spark历史服务
[root@hadoop101 hadoop]
[root@hadoop101 conf]
docker.properties.template fairscheduler.xml.template hive-site.xml log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh
[root@hadoop101 conf]
[root@hadoop101 conf]
spark.yarn.historyServer.address=hadoop102:18080
spark.yarn.historyServer.allowTracking=true
spark.eventLog.dir=hdfs://mycluster/spark_historylog
spark.eventLog.enabled=true
spark.history.fs.logDirectory=hdfs://mycluster/spark_historylog
spark.executor.extraLibraryPath=/opt/module/hadoop-3.1.3/lib/native/
spark.history.fs.logDirectory=hdfs://mycluster/spark_historylog配置对应的目录需要在hdfs上创建
[root@hadoop101 conf]
11. 分发spark安装目录
[root@hadoop101 module]
[root@hadoop101 module]
12. 其余节点配置环境变量
[root@hadoop102 config]
export SPARK_HOME=/opt/module/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@hadoop102 config]
[root@hadoop103 config]
export SPARK_HOME=/opt/module/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@hadoop103 config]
13. hadoop102上启动spark历史服务
[root@hadoop102 config]
starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/module/spark-2.4.5-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-hadoop102.out
[root@hadoop102 config]
30192 HistoryServer
29311 JobHistoryServer
14. 验证使用spark服务
[root@hadoop102 config]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoop102:4040
Spark context available as 'sc' (master = local[*], app id = local-1630050046140).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_211)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("SELECT * FROM blizzard").show
2021-08-27 15:41:40,070 WARN conf.HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
2021-08-27 15:41:40,071 WARN conf.HiveConf: HiveConf of name hive.server2.active.passive.ha.enable does not exist
+---+----+
| id|game|
+---+----+
| 1| wow|
| 2| ow|
+---+----+
第九章 配置以及优化
设置物理核和虚拟核占比 : 当前虚拟机为处理其为2核,那么虚拟化为4核让他比值为1比2
yarn-site.xml
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
修改单个容器下最大cpu资源申请 : 任务提交时,比如spark-submit,executor-core参数不得超过4个
yarn-site.xml
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>4</value>
</property>
设置每个任务容器内存大小和节点内存大小 :控制任务提交每个容器内存的上限,以及yarn所可以占用的内存上限
yarn-site.xml
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>7168</value>
</property>
配置容量调度器队列 :容量调度器默认root队列,现在改为spark, hive两个队列,并设置spark队列资源占比为80%,hive为20%
yarn-site.xml
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.queues</name>
<value>spark,hive</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.spark.capacity</name>
<value>80</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.hive.capacity</name>
<value>20</value>
</property>
hive设置队列
hive-site.xml
<property>
<name>mapred.job.queue.name</name>
<value>hive</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>hive</value>
</property>
<property>
<name>mapred.queue.names</name>
<value>hive</value>
</property>
配置垃圾回收站
core-site.xml
<property>
<name>fs.trash.interval</name>
<value>30</value>
</property>
HDFS配置域名访问
hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
配置HADOOP_MAPRED_HOME
mapred-site.xml
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3</value>
</property>
|