[大数据] 记录过程1

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> 记录过程1 -> 正文阅读

[大数据]记录过程1

前言

竞赛技术平台软件、要求 [第一阶段]

在这里插入图片描述

笔者环境：

环境系统：Centos 7,【三台：Master，Slave1，Slave2】
操作系统：Windows 11 家庭版　[21H2]
操作工具：Xshell 7、VMware 16【创建虚机】
搭建Hadoop环境：完全分布式

环境用软件：

天翼云盘下载：点我跳转
百度云盘下载：点我跳转[密码6273]

apache-flume-1.7.0-bin.tar.gz [flume - 1.7.0版本]
apache-hive-2.3.4-bin.tar.gz [hive - 2.3.4版本]
flink-1.10.2-bin-scala_2.11.tgz [flink - 1.10.2版本]
hadoop-2.7.7.tar.gz [Hadoop - 2.7.7版本]
jdk-8u291-linux-x64.tar.gz [jdk - 1.8版本]
kafka_2.11-2.0.0.tgz [Kafka - 2.0.0]
mysql-5.7.34-1.el7.x86_64.rpm-bundle.tar [MySQL - 5.7.34版本(rpm安装包)]
mysql-5.7.34-el7-x86_64.tar.gz [MySQL - 5.7.34版本(离线安装包)，文中不使用该包]
mysql-connector-java-5.1.49.jar [Hive连接MySQL的Jar包，文中会讲到]
redis-4.0.1.tar.gz [Redis - 4.0.1版本]
scala-2.11.8.tgz [Scala - 2.11.8版本]
spark-2.1.1-bin-hadoop2.7.tgz [Spark - 2.1.1版本]

笔者操作习惯：

tar 包存放于/usr/tar/文件夹下 [需要自行创建]
解压后的软件包存放于/usr/apps/文件夹下 [需要自行创建]
本文采用关闭防火墙方式

# 三台主机都要执行
systemctl stop firewalld

笔者IP为局域网

搭建环境准备

给主机改名

Master节点 - [节点命名无要求]

# Master节点
hostnamectl set-hostname master
# 刷新一下
bash
# 结果
[root@master ~]#

Slave1节点 - [节点命名无要求]

# Slave1节点
hostnamectl set-hostname slave1
# 刷新一下
bash
# 结果
[root@slave1 ~]#

Slave2节点 - [节点命名无要求]

# Slave2节点
hostnamectl set-hostname slave2
# 刷新一下
bash
# 结果
[root@slave2 ~]#

安装必备插件软件

?????????????????三台主机都要安装

# 安装彩色编辑命令
yum install -y vim

# 安装自动校准时间服务器
yum install -y ntp

# 安装MySQL所需要的网络工具
yum install -y net-tools

# 安装上传命令
yum install -y lrzsz

?????????????????Master主机单独安装

# 安装C编译打包命令，Redis需要用到
yum install -y gcc

修改IP映射文件并分发给其他两台主机

修改

vim /etc/hosts

示例 IP请改为自己的IP

[root@master ~]# vim /etc/hosts
# 加入以下内容：
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.38.144 master
192.168.38.145 slave1
192.168.38.146 slave2

分发

scp /etc/hosts slave1:/etc/

注意

笔者的Linux系统没有创建其他用户，如果创建了其他用户，需要将该文件发送到你用到的用户的目录下，一般为root，书写方式请加上用户名@slave1:/etc/

示例

# 分发给Slave1主机
[root@master ~]# scp /etc/hosts slave1:/etc/

# ----------分割线----------

# 分发给Slave2主机
[root@master ~]# scp /etc/hosts slave2:/etc/

配置三台主机免密互通

三台主机生成密钥 [一路回车]

# Master主机生成密钥
[root@master ~]# ssh-keygen -t rsa

# ----------分割线----------

# Slave1主机生成密钥
[root@slave1 ~]# ssh-keygen -t rsa

# ----------分割线----------

# Slave2主机生成密钥
[root@slave2 ~]# ssh-keygen -t rsa

示例 [仅Master，实际操作要配置三台]

[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
bf:cb:43:5b:6e:7f:17:66:d2:c0:b4:71:f7:0d:1a:22 root@master
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|        E . .o...|
|         . .oo+.+|
|            .+  o|
|        S     o  |
|         .. .. = |
|         ..+  + .|
|         .o.o   o|
|          ++ ....|
+-----------------+

三台主机将公钥发送到master主机
注意

不管是Linux主机中操作，还是使用Xshell工具操作，输入密码是没有回显的，故看不到输入的密码

# Master主机
ssh-copy-id master

# Slave1主机
ssh-copy-id master

# Slave2主机
ssh-copy-id master

示例 [仅Master，实际操作需配置三台]

[root@master ~]# ssh-copy-id master
The authenticity of host 'master (192.168.38.141)' can't be established.
ECDSA key fingerprint is 37:7c:ab:d9:86:14:b2:fe:9c:17:3d:5d:3a:ff:ce:c1.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@master's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'master'"
and check to make sure that only the key(s) you wanted were added.

Master主机将公钥发送到另外两台主机
注意

密钥默认的存放地址为 /root/.ssh/ ，上方生成密钥中，以及操作生成密钥时均可查看详细信息 公钥合并文件为 /root/.ssh/ 下的 authorized_keys

# 将汇总到Master主机的公钥发送到Slave1
[root@master ~]# scp /root/.ssh/authorized_keys slave1:/root/.ssh/

# ----------分割线----------

# 将汇总到Master主机的公钥发送到Slave2
[root@master ~]# scp /root/.ssh/authorized_keys slave2:/root/.ssh/

示例

[root@master ~]# scp /root/.ssh/authorized_keys slave1:/root/.ssh/
root@slave1's password: 
authorized_keys                                                                                                                                            100% 1179     1.2KB/s   00:00    

[root@master ~]# scp /root/.ssh/authorized_keys slave2:/root/.ssh/
root@slave2's password: 
authorized_keys

测试免密 [不需要输入密码为成功]

退出为exit

ssh master
ssh slave1
ssh slave2

上传环境所需软件至Master主机

解释

将上方提到的软件上传到 Master主机 的 /usr/tar/ ，上方生成密钥中，以及操作生成密钥时均可查看详细信息
Xshell用户可进入到 /usr/tar 目录后选中全部包 拖动到Xshell窗口
因后期需要使用 Master主机 分发给其他 两台从机 故只需将包上传到Master主机仅可，减少不必要的流量消耗和文件传输时间

上传完毕

[root@master tar]# pwd
/usr/tar
[root@master tar]# ll
总用量 1732092
-rw-r--r--. 1 root root  55711670 10月 19 21:42 apache-flume-1.7.0-bin.tar.gz
-rw-r--r--. 1 root root 232234292 10月 19 21:42 apache-hive-2.3.4-bin.tar.gz
-rw-r--r--. 1 root root 289890742 11月 21 18:10 flink-1.10.2-bin-scala_2.11.tgz
-rw-r--r--. 1 root root 218720521 10月 19 21:42 hadoop-2.7.7.tar.gz
-rw-r--r--. 1 root root 144935989 10月 19 21:42 jdk-8u291-linux-x64.tar.gz
-rw-r--r--. 1 root root  55751827 10月 19 21:42 kafka_2.11-2.0.0.tgz
-rw-r--r--. 1 root root 543856640 10月 19 21:43 mysql-5.7.34-1.el7.x86_64.rpm-bundle.tar
-rw-r--r--. 1 root root   1006904 10月 19 21:44 mysql-connector-java-5.1.49.jar
-rw-r--r--. 1 root root   1711660 10月 19 21:44 redis-4.0.1.tar.gz
-rw-r--r--. 1 root root  28678231 11月  9 18:55 scala-2.11.8.tgz
-rw-r--r--. 1 root root 201142612 10月 19 21:41 spark-2.1.1-bin-hadoop2.7.tgz

若使用 VMware 的用户此时可以建立 “快照” 了，搭建完之后可随时恢复重新练习

开始搭建

新建解压目录

软件包解压后存放位置

mkdir -p /usr/apps/

解释

因 Hive、MySQL 和 Flume 仅需在Master主机上，故暂时不解压，方便发送传输速度，后面会单独 Master 主机上安装

解压安装包到指定目录

[root@master tar]# tar -zxf jdk-8u291-linux-x64.tar.gz -C /usr/apps/
[root@master tar]# tar -zxf hadoop-2.7.7.tar.gz -C /usr/apps/
[root@master tar]# tar -zxf scala-2.11.8.tgz -C /usr/apps/
[root@master tar]# tar -zxf spark-2.1.1-bin-hadoop2.7.tgz -C /usr/apps/
[root@master tar]# tar -zxf flink-1.10.2-bin-scala_2.11.tgz -C /usr/apps/
[root@master tar]# tar -zxf kafka_2.11-2.0.0.tgz -C /usr/apps/

配置各个软件环境变量

编辑环境变量文件

vim /etc/profile

示例

[root@master apps]# vim /etc/profile

修改内容如下 [文章末添加]大写"G"快速定位文章底部

示例 [文件尾部]

for i in /etc/profile.d/*.sh ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done

unset i
unset -f pathmunge

# JAVA_HOME
export JAVA_HOME=/usr/apps/jdk1.8.0_291
export PATH=$JAVA_HOME/bin:$PATH

# HADOOP_HOME
export HADOOP_HOME=/usr/apps/hadoop-2.7.7
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

# SCALA_HOME
export SCALA_HOME=/usr/apps/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH

# SPARK_HOME
export SPARK_HOME=/usr/apps/spark-2.1.1-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

# FLINK_HOME
export FLINK_HOME=/usr/apps/flink-1.10.2
export PATH=$FLINK_HOME/bin:$PATH

# KAFKA_HOME
export KAFKA_HOME=/usr/apps/kafka_2.11-2.0.0
export PATH=$KAFKA_HOME/bin:$PATH

配置Hadoop分布式文件系统

进入Hadoop配置文件路径

cd /usr/apps/hadoop-2.7.7/etc/hadoop/

示例

[root@master apps]# cd /usr/apps/hadoop-2.7.7/etc/hadoop/

[root@master hadoop]# ll
总用量 152
-rw-r--r--. 1 1000 ftp  4436 7月  19 2018 capacity-scheduler.xml
-rw-r--r--. 1 1000 ftp  1335 7月  19 2018 configuration.xsl
-rw-r--r--. 1 1000 ftp   318 7月  19 2018 container-executor.cfg
-rw-r--r--. 1 1000 ftp   774 7月  19 2018 core-site.xml
-rw-r--r--. 1 1000 ftp  3670 7月  19 2018 hadoop-env.cmd
-rw-r--r--. 1 1000 ftp  4224 7月  19 2018 hadoop-env.sh
-rw-r--r--. 1 1000 ftp  2598 7月  19 2018 hadoop-metrics2.properties
-rw-r--r--. 1 1000 ftp  2490 7月  19 2018 hadoop-metrics.properties
-rw-r--r--. 1 1000 ftp  9683 7月  19 2018 hadoop-policy.xml
-rw-r--r--. 1 1000 ftp   775 7月  19 2018 hdfs-site.xml
-rw-r--r--. 1 1000 ftp  1449 7月  19 2018 httpfs-env.sh
-rw-r--r--. 1 1000 ftp  1657 7月  19 2018 httpfs-log4j.properties
-rw-r--r--. 1 1000 ftp    21 7月  19 2018 httpfs-signature.secret
-rw-r--r--. 1 1000 ftp   620 7月  19 2018 httpfs-site.xml
-rw-r--r--. 1 1000 ftp  3518 7月  19 2018 kms-acls.xml
-rw-r--r--. 1 1000 ftp  1527 7月  19 2018 kms-env.sh
-rw-r--r--. 1 1000 ftp  1631 7月  19 2018 kms-log4j.properties
-rw-r--r--. 1 1000 ftp  5540 7月  19 2018 kms-site.xml
-rw-r--r--. 1 1000 ftp 11801 7月  19 2018 log4j.properties
-rw-r--r--. 1 1000 ftp   951 7月  19 2018 mapred-env.cmd
-rw-r--r--. 1 1000 ftp  1383 7月  19 2018 mapred-env.sh
-rw-r--r--. 1 1000 ftp  4113 7月  19 2018 mapred-queues.xml.template
-rw-r--r--. 1 1000 ftp   758 7月  19 2018 mapred-site.xml.template
-rw-r--r--. 1 1000 ftp    10 7月  19 2018 slaves
-rw-r--r--. 1 1000 ftp  2316 7月  19 2018 ssl-client.xml.example
-rw-r--r--. 1 1000 ftp  2697 7月  19 2018 ssl-server.xml.example
-rw-r--r--. 1 1000 ftp  2250 7月  19 2018 yarn-env.cmd
-rw-r--r--. 1 1000 ftp  4567 7月  19 2018 yarn-env.sh
-rw-r--r--. 1 1000 ftp   690 7月  19 2018 yarn-site.xml

复制mapred-site.xml.template模板为mapred-site.xml

cp mapred-site.xml.template mapred-site.xml.template

示例

[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml.template

编辑hadoop-env.sh

vim hadoop-env.sh

25行的JAVA_HOME需要更改 [:set nu]为显示行号，且该命令下文不再提示
示例

 19 # The only required environment variable is JAVA_HOME.  All others are
 20 # optional.  When running a distributed configuration it is best to
 21 # set JAVA_HOME in this file, so that it is correctly defined on
 22 # remote nodes.
 23 
 24 # The java implementation to use.
 25 export JAVA_HOME=/usr/apps/jdk1.8.0_291
 26 
 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
 28 # that bind to privileged ports to provide authentication of data transfer
 29 # protocol.  Jsvc is not required if SASL is configured for authentication of
 30 # data transfer protocol using non-privileged ports.
 31 #export JSVC_HOME=${JSVC_HOME}

编辑core-site.xml

vim core-site.xml

你可能需要用到的代码模板 [ 便于复制 ]
Hadoop配置文件中几乎都会用到该模板 [ 真实比赛不会提供 ]

	<property>
		<name></name>
		<value></value>
	</property>

示例 [ 文章末添加，注意标签 ]

<configuration>

	<property>
		<!-- 指定HDFS中NameNode的地址-->
		<name>fs.default.name</name>
		<value>hdfs://master:9000</value>
	</property>

	<property>
		<!-- 指定Hadoop运行时产生文件的存储目录-->
		<name>hadoop.tmp.dir</name>
		<value>/usr/apps/data/hadoop</value>
	</property>
	
</configuration>

编辑hdfs-site.xml

vim hdfs-site.xml

示例 [ 文章末添加，注意标签 ] 如下副本数，根据实际操作选择几台

<configuration>

	<property>
	<!-- 指定文件副本数 -->
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	
	<property>
		<!-- 指定secondary主机和端口 -->
		<!-- secondary：辅助管理namenode主节点 -->
		<name>dfs.namenode.secondary.http-address</name>
		<value>slave1:50090</value>
	</property>
	
</configuration>

编辑mapred-site.xml

vim mapred-site.xml

示例 [ 文章末添加，注意标签 ]

<configuration>

	<property>
		<!-- 指定MapReduce运行时框架，这里指定在Yarn上，默认是local -->
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	
</configuration>

编辑yarn-site.xml

vim yarn-site.xml

示例 [ 文章末添加，注意标签 ]

<configuration>

	<!-- Site specific YARN configuration properties -->
	<property>
		<!-- yarn的主节点在master主机上 -->
		<name>yarn.resourcemanager.hostname</name>
		<value>master<value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	
</configuration>

编辑slaves

vim slaves

示例 [ 文章末添加，注意标签 ]

master
slave1
slave2

配置Spark

进入Spark配置文件路径

cd /usr/apps/spark-2.1.1-bin-hadoop2.7/conf/

示例

[root@master hadoop]# cd /usr/apps/spark-2.1.1-bin-hadoop2.7/conf/
[root@master conf]# ll
总用量 32
-rw-r--r--. 1 500 500  987 4月  26 2017 docker.properties.template
-rw-r--r--. 1 500 500 1105 4月  26 2017 fairscheduler.xml.template
-rw-r--r--. 1 500 500 2025 4月  26 2017 log4j.properties.template
-rw-r--r--. 1 500 500 7313 4月  26 2017 metrics.properties.template
-rw-r--r--. 1 500 500  865 4月  26 2017 slaves.template
-rw-r--r--. 1 500 500 1292 4月  26 2017 spark-defaults.conf.template
-rwxr-xr-x. 1 500 500 3960 4月  26 2017 spark-env.sh.template

复制spark-env.sh.template模板为spark-env.sh

cp spark-env.sh.template spark-env.sh

示例

[root@master conf]# cp spark-env.sh.template spark-env.sh

编辑spark-env.sh

vim spark-env.sh

示例 [ 文章末添加 ]

# 各个软件的路径，不再细说
export JAVA_HOME=/usr/apps/jdk1.8.0_291
export HADOOP_HOME=/usr/apps/hadoop-2.7.7
export HADOOP_CONF_DIR=/usr/apps/hadoop-2.7.7/etc/hadoop
export SCALA_HOME=/usr/apps/scala-2.11.8
# Spark的主机IP，写IP同理
export SPARK_MASTER_IP=master
# Spark的内存
export SPARK_WORKER_MEMORY=8G
# Spark的核心数
export SPARK_WORKER_CORES=4
# 每台机器的实例化机Worker，如果写2，那么就是每台从机两个Worker进程
export SPARK_WORKER_INSTANCES=1

编辑slaves

有心的读者可能看到有一个文件：slaves.template无需复制该模板文件，直接新建编辑slaves文件即可

vim slaves

示例

slave1
slave2

配置Flink

进入Flink配置文件路径

cd /usr/apps/flink-1.10.2/conf

示例

[root@master hadoop]# cd /usr/apps/flink-1.10.2/conf
[root@master conf]# ll
总用量 60
-rw-r--r--. 1 root root 10202 8月  15 2020 flink-conf.yaml
-rw-r--r--. 1 root root  2138 8月  15 2020 log4j-cli.properties
-rw-r--r--. 1 root root  1884 8月  15 2020 log4j-console.properties
-rw-r--r--. 1 root root  1939 8月  15 2020 log4j.properties
-rw-r--r--. 1 root root  1709 8月  15 2020 log4j-yarn-session.properties
-rw-r--r--. 1 root root  2294 8月  15 2020 logback-console.xml
-rw-r--r--. 1 root root  2331 8月  15 2020 logback.xml
-rw-r--r--. 1 root root  1550 8月  15 2020 logback-yarn.xml
-rw-r--r--. 1 root root    15 8月  15 2020 masters
-rw-r--r--. 1 root root    10 8月  15 2020 slaves
-rw-r--r--. 1 root root  5424 8月  15 2020 sql-client-defaults.yaml
-rw-r--r--. 1 root root  1434 8月  15 2020 zoo.cfg

编辑flink-conf.yaml

vim flink-conf.yaml

示例 [ 内存和TaskSlots根据要求更改 ]

# JobManager runs.
# rpc通信地址
jobmanager.rpc.address: master

# The RPC port where the JobManager is reachable.
# rpc端口
jobmanager.rpc.port: 6123


# The heap size for the JobManager JVM
# 资源调度内存
jobmanager.heap.size: 2048m


# The total process memory size for the TaskManager.
#
# Note this accounts for all memory usage within the TaskManager process, including JVM metaspace and other overhead.
# 任务运行内存，该内存可尽量大一点
taskmanager.memory.process.size: 4096m

# To exclude JVM metaspace and overhead, please, use total Flink memory size instead of 'taskmanager.memory.process.size'.
# It is not recommended to set both 'taskmanager.memory.process.size' and Flink memory.
#
# taskmanager.memory.flink.size: 1280m

# The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.
# 任务并行度 [ “插槽” ]
taskmanager.numberOfTaskSlots: 5

# The parallelism used for programs that did not specify and other parallelism.
# 默认并行度：代码 > WebUI界面 > 默认
parallelism.default: 3

# The default file system scheme and authority.

编辑masters

vim masters

示例

master:8081

编辑slaves

vim slaves

示例

slave1
slave2

配置Kafka

进入Kafka配置文件路径

cd /usr/apps/kafka_2.11-2.0.0/config/

示例

[root@master config]# pwd
/usr/apps/kafka_2.11-2.0.0/config
[root@master config]# ll
总用量 68
-rw-r--r--. 1 root root  906 7月  24 2018 connect-console-sink.properties
-rw-r--r--. 1 root root  909 7月  24 2018 connect-console-source.properties
-rw-r--r--. 1 root root 5321 7月  24 2018 connect-distributed.properties
-rw-r--r--. 1 root root  883 7月  24 2018 connect-file-sink.properties
-rw-r--r--. 1 root root  881 7月  24 2018 connect-file-source.properties
-rw-r--r--. 1 root root 1111 7月  24 2018 connect-log4j.properties
-rw-r--r--. 1 root root 2262 7月  24 2018 connect-standalone.properties
-rw-r--r--. 1 root root 1221 7月  24 2018 consumer.properties
-rw-r--r--. 1 root root 4727 7月  24 2018 log4j.properties
-rw-r--r--. 1 root root 1919 7月  24 2018 producer.properties
-rw-r--r--. 1 root root 6851 7月  24 2018 server.properties
-rw-r--r--. 1 root root 1032 7月  24 2018 tools-log4j.properties
-rw-r--r--. 1 root root 1169 7月  24 2018 trogdor.conf
-rw-r--r--. 1 root root 1023 7月  24 2018 zookeeper.properties

编辑server.properties

vim server.properties

示例 [ 上半部分仅列出修改内容，下半部分贴出全部配置文件 ]

# 21行，作为Kafka的唯一标识，Slave1、Slave2要更改，后面
broker.id=1
# 31行，将注释放开，添加为本机[ Master ]IP，且Slave1，Slave2需要更改为“本机IP”或映射名称
listeners=PLAINTEXT://master:9092
# 添加host.name，其他机器需要改为“本机IP”或映射名称
# 此处的host.name为本机IP(重要),如果不改,则客户端会抛出:
# Producer connection to localhost:9092 unsuccessful 错误!
# 32行添加
host.name=master
# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
# 36行，打开注释，其他机器需要改为“本机IP”或映射名称
advertised.listeners=PLAINTEXT://master:9092
# 60行，Kafka的数据存盘位置，并不是log文件存放位置
log.dirs=/usr/apps/data/kafka-logs
# 65行，topic在当前broker上的分片个数，要求等于机器数量
num.partitions=3
# 74 ~ 76 行
# __consumer_offsets副本数量
offsets.topic.replication.factor=3
# 分区数
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
# 123行，zookeeper通信地址
zookeeper.connect=master:2181,slave1:2181,slave2:2181

示例 [ 全部代码 ]

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://master:9092
host.name=master
# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://master:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/usr/apps/data/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=3

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=master:2181,slave1:2181,slave2:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

编辑zookeeper.properties

vim zookeeper.properties

示例

# 存放zookeeper唯一标识“myid”文件
dataDir=/usr/apps/data/zk/zkdata
# 存放zookeeper的log日志
dataLogDir=/usr/apps/data/zk/zklog
# the port at which the clients will connect
# zookeeper的通信端口
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
# 最大连接数，注释掉
# maxClientCnxns=0
# CS通信心跳数(毫秒/ms)
tickTime=2000
# LF初始通信时限
# 集群中的follower服务器(F)与leader服务器(L)之间 初始连接 时能容忍的最多心跳数（tickTime的数量）。
initLimit=10
# LF同步通信时限
# 集群中的follower服务器(F)与leader服务器(L)之间 请求和应答 之间能容忍的最多心跳数（tickTime的数量）。
syncLimit=5
# 各个主机地址和端口
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

新建文件夹：zkdata、zklog

mkdir -p /usr/apps/data/zk/zkdata
mkdir -p /usr/apps/data/zk/zklog

示例

[root@master config]# mkdir -p /usr/apps/data/zk/zkdata
[root@master config]# mkdir -p /usr/apps/data/zk/zklog

向文件输入内容：输入内容自动创建文件

mkdir -p /usr/apps/data/zk/zkdata
mkdir -p /usr/apps/data/zk/zklog

示例

# 将“1”输入到“myid”文件中，该值(1)为唯一值，其他两台从机需要更改。
echo 1 > /usr/apps/data/zk/zkdata/myid

# 查看“myid”文件中的内容
[root@master config]# cat /usr/apps/data/zk/zkdata/myid
1

分发

分发环境变量文件

scp /etc/profile slave1:/etc/
scp /etc/profile slave2:/etc/

示例

[root@master ~]# scp /etc/profile slave1:/etc/
profile                                                                                                                                                      100% 2319     2.3KB/s   00:00    
[root@master ~]# scp /etc/profile slave2:/etc/
profile                                                                                                                                                      100% 2319     2.3KB/s   00:00

分发配置完毕的文件

分发时间可能有点漫长，耐心等待。若出现需要输入密码，请重新配置免密！

scp -r /usr/apps/ slave1:/usr/
scp -r /usr/apps/ slave2:/usr/

示例 [ 内容过多，不作详细展示 ]

[root@master ~]# scp -r /usr/apps/ slave1:/usr/
# 文件分发过程...
# 文件分发过程...
# 文件分发过程...
[root@master ~]# scp -r /usr/apps/ slave2:/usr/
# 文件分发过程...
# 文件分发过程...
# 文件分发过程...

修改从机配置文件

修改Kafka中server.properties配置文件

vim /usr/apps/kafka_2.11-2.0.0/config/server.properties

示例 [ Slave1从机 ]

[root@slave1 ~]# vim /usr/apps/kafka_2.11-2.0.0/config/server.properties

# 更改文件如下
broker.id=2
listeners=PLAINTEXT://slave1:9092
host.name=slave1
advertised.listeners=PLAINTEXT://slave1:9092

示例 [ Slave2从机 ]

[root@slave2 ~]# vim /usr/apps/kafka_2.11-2.0.0/config/server.properties

# 更改文件如下
broker.id=3
listeners=PLAINTEXT://slave2:9092
host.name=slave1
advertised.listeners=PLAINTEXT://slave2:9092

修改zookeeper的myid文件的值

vim /usr/apps/data/zk/zkdata/myid

示例 [ Slave1从机 ]

[root@slave1 ~]# vim /usr/apps/data/zk/zkdata/myid
# 修改结果
[root@slave1 ~]# cat /usr/apps/data/zk/zkdata/myid 
2

示例 [ Slave2从机 ]

[root@slave2 ~]# vim /usr/apps/data/zk/zkdata/myid
# 修改结果
[root@slave2 ~]# cat /usr/apps/data/zk/zkdata/myid 
3

启动各个软件

操作之前请务必查看三台防火墙是否关闭！

# 查看防火墙状态
systemctl status firewalld
# 关闭防火墙
systemctl stop firewalld
# 开启防火墙
systemctl start firewalld

启动Hadoop集群

启动HDFS分布式文件系统

Master主机操作即可

大数据最新文章

实现Kafka至少消费一次

亚马逊云科技：还在苦于ETL？Zero ETL的时代

初探MapReduce

【SpringBoot框架篇】32.基于注解+redis实现

Elasticsearch：如何减少 Elasticsearch 集

Go redis操作

Redis面试题

专题五 Redis高并发场景

基于GBase8s和Calcite的多数据源查询

Redis——底层数据结构原理

加:2021-12-07 12:05:45 更:2021-12-07 12:09:18

360图书馆购物三丰科技阅读网日历万年历 2025年11日历

-2025/11/29 1:56:08-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码