IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> 记录过程1 -> 正文阅读

[大数据]记录过程1

前言

  1. 竞赛技术平台软件、要求 [第一阶段]

在这里插入图片描述
在这里插入图片描述

  1. 笔者环境:
  • 环境系统:Centos 7,【三台:Master,Slave1,Slave2】
  • 操作系统:Windows 11 家庭版 [21H2]
  • 操作工具:Xshell 7VMware 16【创建虚机】
  • 搭建Hadoop环境:完全分布式
  1. 环境用软件:
天翼云盘下载:点我跳转
百度云盘下载:点我跳转[密码6273]

  • apache-flume-1.7.0-bin.tar.gz [flume - 1.7.0版本]
  • apache-hive-2.3.4-bin.tar.gz [hive - 2.3.4版本]
  • flink-1.10.2-bin-scala_2.11.tgz [flink - 1.10.2版本]
  • hadoop-2.7.7.tar.gz [Hadoop - 2.7.7版本]
  • jdk-8u291-linux-x64.tar.gz [jdk - 1.8版本]
  • kafka_2.11-2.0.0.tgz [Kafka - 2.0.0]
  • mysql-5.7.34-1.el7.x86_64.rpm-bundle.tar [MySQL - 5.7.34版本(rpm安装包)]
  • mysql-5.7.34-el7-x86_64.tar.gz [MySQL - 5.7.34版本(离线安装包),文中不使用该包]
  • mysql-connector-java-5.1.49.jar [Hive连接MySQL的Jar包,文中会讲到]
  • redis-4.0.1.tar.gz [Redis - 4.0.1版本]
  • scala-2.11.8.tgz [Scala - 2.11.8版本]
  • spark-2.1.1-bin-hadoop2.7.tgz [Spark - 2.1.1版本]
  1. 笔者操作习惯:
  • tar 包存放于/usr/tar/文件夹下 [需要自行创建]
  • 解压后的软件包存放于/usr/apps/文件夹下 [需要自行创建]
  • 本文采用关闭防火墙方式
# 三台主机都要执行
systemctl stop firewalld
  • 笔者IP为局域网

搭建环境准备

  1. 给主机改名
  • Master节点 - [节点命名无要求]
# Master节点
hostnamectl set-hostname master
# 刷新一下
bash
# 结果
[root@master ~]# 
  • Slave1节点 - [节点命名无要求]
# Slave1节点
hostnamectl set-hostname slave1
# 刷新一下
bash
# 结果
[root@slave1 ~]# 
  • Slave2节点 - [节点命名无要求]
# Slave2节点
hostnamectl set-hostname slave2
# 刷新一下
bash
# 结果
[root@slave2 ~]# 
  1. 安装必备插件软件

?????????????????三台主机都要安装

# 安装彩色编辑命令
yum install -y vim
# 安装自动校准时间服务器
yum install -y ntp
# 安装MySQL所需要的网络工具
yum install -y net-tools
# 安装上传命令
yum install -y lrzsz

?????????????????Master主机单独安装

# 安装C编译打包命令,Redis需要用到
yum install -y gcc
  1. 修改IP映射文件分发给其他两台主机
  • 修改
vim /etc/hosts
  • 示例 IP请改为自己的IP
[root@master ~]# vim /etc/hosts
# 加入以下内容:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.38.144 master
192.168.38.145 slave1
192.168.38.146 slave2
  • 分发
scp /etc/hosts slave1:/etc/
  • 注意
笔者的Linux系统没有创建其他用户,如果创建了其他用户,需要将该文件发送到你用到的用户的目录下,一般为root,书写方式请加上用户名@slave1:/etc/
  • 示例
# 分发给Slave1主机
[root@master ~]# scp /etc/hosts slave1:/etc/

# ----------分割线----------

# 分发给Slave2主机
[root@master ~]# scp /etc/hosts slave2:/etc/
  1. 配置三台主机免密互通
  • 三台主机生成密钥 [一路回车]
# Master主机生成密钥
[root@master ~]# ssh-keygen -t rsa

# ----------分割线----------

# Slave1主机生成密钥
[root@slave1 ~]# ssh-keygen -t rsa

# ----------分割线----------

# Slave2主机生成密钥
[root@slave2 ~]# ssh-keygen -t rsa
  • 示例 [仅Master,实际操作要配置三台]
[root@master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
bf:cb:43:5b:6e:7f:17:66:d2:c0:b4:71:f7:0d:1a:22 root@master
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|        E . .o...|
|         . .oo+.+|
|            .+  o|
|        S     o  |
|         .. .. = |
|         ..+  + .|
|         .o.o   o|
|          ++ ....|
+-----------------+
  • 三台主机将公钥发送到master主机
  • 注意
不管是Linux主机中操作,还是使用Xshell工具操作,输入密码是没有回显的,故看不到输入的密码
# Master主机
ssh-copy-id master

# Slave1主机
ssh-copy-id master

# Slave2主机
ssh-copy-id master
  • 示例 [仅Master,实际操作需配置三台]
[root@master ~]# ssh-copy-id master
The authenticity of host 'master (192.168.38.141)' can't be established.
ECDSA key fingerprint is 37:7c:ab:d9:86:14:b2:fe:9c:17:3d:5d:3a:ff:ce:c1.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@master's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'master'"
and check to make sure that only the key(s) you wanted were added.
  • Master主机将公钥发送到另外两台主机
  • 注意
密钥默认的存放地址为 /root/.ssh/ ,上方生成密钥中,以及操作生成密钥时均可查看详细信息 公钥合并文件为 /root/.ssh/ 下的 authorized_keys
# 将汇总到Master主机的公钥发送到Slave1
[root@master ~]# scp /root/.ssh/authorized_keys slave1:/root/.ssh/

# ----------分割线----------

# 将汇总到Master主机的公钥发送到Slave2
[root@master ~]# scp /root/.ssh/authorized_keys slave2:/root/.ssh/
  • 示例
[root@master ~]# scp /root/.ssh/authorized_keys slave1:/root/.ssh/
root@slave1's password: 
authorized_keys                                                                                                                                            100% 1179     1.2KB/s   00:00    

[root@master ~]# scp /root/.ssh/authorized_keys slave2:/root/.ssh/
root@slave2's password: 
authorized_keys
  • 测试免密 [不需要输入密码为成功]

退出为exit

ssh master
ssh slave1
ssh slave2
  1. 上传环境所需软件至Master主机
  • 解释
将上方提到的软件上传到 Master主机 /usr/tar/ ,上方生成密钥中,以及操作生成密钥时均可查看详细信息
Xshell用户可进入到 /usr/tar 目录后选中全部包 拖动到Xshell窗口
因后期需要使用 Master主机 分发给其他 两台从机 故只需将包上传到Master主机仅可,减少不必要的流量消耗和文件传输时间
  • 上传完毕
[root@master tar]# pwd
/usr/tar
[root@master tar]# ll
总用量 1732092
-rw-r--r--. 1 root root  55711670 10月 19 21:42 apache-flume-1.7.0-bin.tar.gz
-rw-r--r--. 1 root root 232234292 10月 19 21:42 apache-hive-2.3.4-bin.tar.gz
-rw-r--r--. 1 root root 289890742 11月 21 18:10 flink-1.10.2-bin-scala_2.11.tgz
-rw-r--r--. 1 root root 218720521 10月 19 21:42 hadoop-2.7.7.tar.gz
-rw-r--r--. 1 root root 144935989 10月 19 21:42 jdk-8u291-linux-x64.tar.gz
-rw-r--r--. 1 root root  55751827 10月 19 21:42 kafka_2.11-2.0.0.tgz
-rw-r--r--. 1 root root 543856640 10月 19 21:43 mysql-5.7.34-1.el7.x86_64.rpm-bundle.tar
-rw-r--r--. 1 root root   1006904 10月 19 21:44 mysql-connector-java-5.1.49.jar
-rw-r--r--. 1 root root   1711660 10月 19 21:44 redis-4.0.1.tar.gz
-rw-r--r--. 1 root root  28678231 11月  9 18:55 scala-2.11.8.tgz
-rw-r--r--. 1 root root 201142612 10月 19 21:41 spark-2.1.1-bin-hadoop2.7.tgz

若使用 VMware 的用户此时可以建立 “快照” 了,搭建完之后可随时恢复重新练习

开始搭建

新建解压目录

  • 软件包解压后存放位置
mkdir -p /usr/apps/
  • 解释
Hive、MySQL Flume 仅需在Master主机上,故暂时不解压,方便发送传输速度,后面会单独 Master 主机上安装

解压安装包到指定目录

[root@master tar]# tar -zxf jdk-8u291-linux-x64.tar.gz -C /usr/apps/
[root@master tar]# tar -zxf hadoop-2.7.7.tar.gz -C /usr/apps/
[root@master tar]# tar -zxf scala-2.11.8.tgz -C /usr/apps/
[root@master tar]# tar -zxf spark-2.1.1-bin-hadoop2.7.tgz -C /usr/apps/
[root@master tar]# tar -zxf flink-1.10.2-bin-scala_2.11.tgz -C /usr/apps/
[root@master tar]# tar -zxf kafka_2.11-2.0.0.tgz -C /usr/apps/

配置各个软件环境变量

  1. 编辑环境变量文件
vim /etc/profile
  • 示例
[root@master apps]# vim /etc/profile
  1. 修改内容如下 [文章末添加]大写"G"快速定位文章底部
  • 示例 [文件尾部]
for i in /etc/profile.d/*.sh ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done

unset i
unset -f pathmunge

# JAVA_HOME
export JAVA_HOME=/usr/apps/jdk1.8.0_291
export PATH=$JAVA_HOME/bin:$PATH

# HADOOP_HOME
export HADOOP_HOME=/usr/apps/hadoop-2.7.7
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

# SCALA_HOME
export SCALA_HOME=/usr/apps/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH

# SPARK_HOME
export SPARK_HOME=/usr/apps/spark-2.1.1-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH

# FLINK_HOME
export FLINK_HOME=/usr/apps/flink-1.10.2
export PATH=$FLINK_HOME/bin:$PATH

# KAFKA_HOME
export KAFKA_HOME=/usr/apps/kafka_2.11-2.0.0
export PATH=$KAFKA_HOME/bin:$PATH

配置Hadoop分布式文件系统

  1. 进入Hadoop配置文件路径
cd /usr/apps/hadoop-2.7.7/etc/hadoop/
  • 示例
[root@master apps]# cd /usr/apps/hadoop-2.7.7/etc/hadoop/
[root@master hadoop]# ll
总用量 152
-rw-r--r--. 1 1000 ftp  4436 7月  19 2018 capacity-scheduler.xml
-rw-r--r--. 1 1000 ftp  1335 7月  19 2018 configuration.xsl
-rw-r--r--. 1 1000 ftp   318 7月  19 2018 container-executor.cfg
-rw-r--r--. 1 1000 ftp   774 7月  19 2018 core-site.xml
-rw-r--r--. 1 1000 ftp  3670 7月  19 2018 hadoop-env.cmd
-rw-r--r--. 1 1000 ftp  4224 7月  19 2018 hadoop-env.sh
-rw-r--r--. 1 1000 ftp  2598 7月  19 2018 hadoop-metrics2.properties
-rw-r--r--. 1 1000 ftp  2490 7月  19 2018 hadoop-metrics.properties
-rw-r--r--. 1 1000 ftp  9683 7月  19 2018 hadoop-policy.xml
-rw-r--r--. 1 1000 ftp   775 7月  19 2018 hdfs-site.xml
-rw-r--r--. 1 1000 ftp  1449 7月  19 2018 httpfs-env.sh
-rw-r--r--. 1 1000 ftp  1657 7月  19 2018 httpfs-log4j.properties
-rw-r--r--. 1 1000 ftp    21 7月  19 2018 httpfs-signature.secret
-rw-r--r--. 1 1000 ftp   620 7月  19 2018 httpfs-site.xml
-rw-r--r--. 1 1000 ftp  3518 7月  19 2018 kms-acls.xml
-rw-r--r--. 1 1000 ftp  1527 7月  19 2018 kms-env.sh
-rw-r--r--. 1 1000 ftp  1631 7月  19 2018 kms-log4j.properties
-rw-r--r--. 1 1000 ftp  5540 7月  19 2018 kms-site.xml
-rw-r--r--. 1 1000 ftp 11801 7月  19 2018 log4j.properties
-rw-r--r--. 1 1000 ftp   951 7月  19 2018 mapred-env.cmd
-rw-r--r--. 1 1000 ftp  1383 7月  19 2018 mapred-env.sh
-rw-r--r--. 1 1000 ftp  4113 7月  19 2018 mapred-queues.xml.template
-rw-r--r--. 1 1000 ftp   758 7月  19 2018 mapred-site.xml.template
-rw-r--r--. 1 1000 ftp    10 7月  19 2018 slaves
-rw-r--r--. 1 1000 ftp  2316 7月  19 2018 ssl-client.xml.example
-rw-r--r--. 1 1000 ftp  2697 7月  19 2018 ssl-server.xml.example
-rw-r--r--. 1 1000 ftp  2250 7月  19 2018 yarn-env.cmd
-rw-r--r--. 1 1000 ftp  4567 7月  19 2018 yarn-env.sh
-rw-r--r--. 1 1000 ftp   690 7月  19 2018 yarn-site.xml
  1. 复制mapred-site.xml.template模板为mapred-site.xml
cp mapred-site.xml.template mapred-site.xml.template
  • 示例
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml.template
  1. 编辑hadoop-env.sh
vim hadoop-env.sh
  • 25行的JAVA_HOME需要更改 [:set nu]为显示行号,且该命令下文不再提示
  • 示例
 19 # The only required environment variable is JAVA_HOME.  All others are
 20 # optional.  When running a distributed configuration it is best to
 21 # set JAVA_HOME in this file, so that it is correctly defined on
 22 # remote nodes.
 23 
 24 # The java implementation to use.
 25 export JAVA_HOME=/usr/apps/jdk1.8.0_291
 26 
 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
 28 # that bind to privileged ports to provide authentication of data transfer
 29 # protocol.  Jsvc is not required if SASL is configured for authentication of
 30 # data transfer protocol using non-privileged ports.
 31 #export JSVC_HOME=${JSVC_HOME}
  1. 编辑core-site.xml
vim core-site.xml
  • 你可能需要用到的代码模板 [ 便于复制 ]
  • Hadoop配置文件中几乎都会用到该模板 [ 真实比赛不会提供 ]
	<property>
		<name></name>
		<value></value>
	</property>
  • 示例 [ 文章末添加,注意标签 ]
<configuration>

	<property>
		<!-- 指定HDFS中NameNode的地址-->
		<name>fs.default.name</name>
		<value>hdfs://master:9000</value>
	</property>

	<property>
		<!-- 指定Hadoop运行时产生文件的存储目录-->
		<name>hadoop.tmp.dir</name>
		<value>/usr/apps/data/hadoop</value>
	</property>
	
</configuration>
  1. 编辑hdfs-site.xml
vim hdfs-site.xml
  • 示例 [ 文章末添加,注意标签 ] 如下副本数,根据实际操作选择几台
<configuration>

	<property>
	<!-- 指定文件副本数 -->
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	
	<property>
		<!-- 指定secondary主机和端口 -->
		<!-- secondary:辅助管理namenode主节点 -->
		<name>dfs.namenode.secondary.http-address</name>
		<value>slave1:50090</value>
	</property>
	
</configuration>
  1. 编辑mapred-site.xml
vim mapred-site.xml
  • 示例 [ 文章末添加,注意标签 ]
<configuration>

	<property>
		<!-- 指定MapReduce运行时框架,这里指定在Yarn上,默认是local -->
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	
</configuration>
  1. 编辑yarn-site.xml
vim yarn-site.xml
  • 示例 [ 文章末添加,注意标签 ]
<configuration>

	<!-- Site specific YARN configuration properties -->
	<property>
		<!-- yarn的主节点在master主机上 -->
		<name>yarn.resourcemanager.hostname</name>
		<value>master<value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	
</configuration>
  1. 编辑slaves
vim slaves
  • 示例 [ 文章末添加,注意标签 ]
master
slave1
slave2



配置Spark

  1. 进入Spark配置文件路径
cd /usr/apps/spark-2.1.1-bin-hadoop2.7/conf/
  • 示例
[root@master hadoop]# cd /usr/apps/spark-2.1.1-bin-hadoop2.7/conf/
[root@master conf]# ll
总用量 32
-rw-r--r--. 1 500 500  987 4月  26 2017 docker.properties.template
-rw-r--r--. 1 500 500 1105 4月  26 2017 fairscheduler.xml.template
-rw-r--r--. 1 500 500 2025 4月  26 2017 log4j.properties.template
-rw-r--r--. 1 500 500 7313 4月  26 2017 metrics.properties.template
-rw-r--r--. 1 500 500  865 4月  26 2017 slaves.template
-rw-r--r--. 1 500 500 1292 4月  26 2017 spark-defaults.conf.template
-rwxr-xr-x. 1 500 500 3960 4月  26 2017 spark-env.sh.template
  1. 复制spark-env.sh.template模板为spark-env.sh
cp spark-env.sh.template spark-env.sh
  • 示例
[root@master conf]# cp spark-env.sh.template spark-env.sh
  1. 编辑spark-env.sh
vim spark-env.sh
  • 示例 [ 文章末添加 ]
# 各个软件的路径,不再细说
export JAVA_HOME=/usr/apps/jdk1.8.0_291
export HADOOP_HOME=/usr/apps/hadoop-2.7.7
export HADOOP_CONF_DIR=/usr/apps/hadoop-2.7.7/etc/hadoop
export SCALA_HOME=/usr/apps/scala-2.11.8
# Spark的主机IP,写IP同理
export SPARK_MASTER_IP=master
# Spark的内存
export SPARK_WORKER_MEMORY=8G
# Spark的核心数
export SPARK_WORKER_CORES=4
# 每台机器的实例化机Worker,如果写2,那么就是每台从机两个Worker进程
export SPARK_WORKER_INSTANCES=1
  1. 编辑slaves
  • 有心的读者可能看到有一个文件:slaves.template无需复制该模板文件,直接新建编辑slaves文件即可
vim slaves
  • 示例
slave1
slave2



配置Flink

  1. 进入Flink配置文件路径
cd /usr/apps/flink-1.10.2/conf
  • 示例
[root@master hadoop]# cd /usr/apps/flink-1.10.2/conf
[root@master conf]# ll
总用量 60
-rw-r--r--. 1 root root 10202 8月  15 2020 flink-conf.yaml
-rw-r--r--. 1 root root  2138 8月  15 2020 log4j-cli.properties
-rw-r--r--. 1 root root  1884 8月  15 2020 log4j-console.properties
-rw-r--r--. 1 root root  1939 8月  15 2020 log4j.properties
-rw-r--r--. 1 root root  1709 8月  15 2020 log4j-yarn-session.properties
-rw-r--r--. 1 root root  2294 8月  15 2020 logback-console.xml
-rw-r--r--. 1 root root  2331 8月  15 2020 logback.xml
-rw-r--r--. 1 root root  1550 8月  15 2020 logback-yarn.xml
-rw-r--r--. 1 root root    15 8月  15 2020 masters
-rw-r--r--. 1 root root    10 8月  15 2020 slaves
-rw-r--r--. 1 root root  5424 8月  15 2020 sql-client-defaults.yaml
-rw-r--r--. 1 root root  1434 8月  15 2020 zoo.cfg
  1. 编辑flink-conf.yaml
vim flink-conf.yaml
  • 示例 [ 内存和TaskSlots根据要求更改 ]
# JobManager runs.
# rpc通信地址
jobmanager.rpc.address: master

# The RPC port where the JobManager is reachable.
# rpc端口
jobmanager.rpc.port: 6123


# The heap size for the JobManager JVM
# 资源调度内存
jobmanager.heap.size: 2048m


# The total process memory size for the TaskManager.
#
# Note this accounts for all memory usage within the TaskManager process, including JVM metaspace and other overhead.
# 任务运行内存,该内存可尽量大一点
taskmanager.memory.process.size: 4096m

# To exclude JVM metaspace and overhead, please, use total Flink memory size instead of 'taskmanager.memory.process.size'.
# It is not recommended to set both 'taskmanager.memory.process.size' and Flink memory.
#
# taskmanager.memory.flink.size: 1280m

# The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline.
# 任务并行度 [ “插槽” ]
taskmanager.numberOfTaskSlots: 5

# The parallelism used for programs that did not specify and other parallelism.
# 默认并行度:代码 > WebUI界面 > 默认
parallelism.default: 3

# The default file system scheme and authority.
  1. 编辑masters
vim masters
  • 示例
master:8081
  1. 编辑slaves
vim slaves
  • 示例
slave1
slave2

配置Kafka

  1. 进入Kafka配置文件路径
cd /usr/apps/kafka_2.11-2.0.0/config/
  • 示例
[root@master config]# pwd
/usr/apps/kafka_2.11-2.0.0/config
[root@master config]# ll
总用量 68
-rw-r--r--. 1 root root  906 7月  24 2018 connect-console-sink.properties
-rw-r--r--. 1 root root  909 7月  24 2018 connect-console-source.properties
-rw-r--r--. 1 root root 5321 7月  24 2018 connect-distributed.properties
-rw-r--r--. 1 root root  883 7月  24 2018 connect-file-sink.properties
-rw-r--r--. 1 root root  881 7月  24 2018 connect-file-source.properties
-rw-r--r--. 1 root root 1111 7月  24 2018 connect-log4j.properties
-rw-r--r--. 1 root root 2262 7月  24 2018 connect-standalone.properties
-rw-r--r--. 1 root root 1221 7月  24 2018 consumer.properties
-rw-r--r--. 1 root root 4727 7月  24 2018 log4j.properties
-rw-r--r--. 1 root root 1919 7月  24 2018 producer.properties
-rw-r--r--. 1 root root 6851 7月  24 2018 server.properties
-rw-r--r--. 1 root root 1032 7月  24 2018 tools-log4j.properties
-rw-r--r--. 1 root root 1169 7月  24 2018 trogdor.conf
-rw-r--r--. 1 root root 1023 7月  24 2018 zookeeper.properties
  1. 编辑server.properties
vim server.properties
  • 示例 [ 上半部分仅列出修改内容,下半部分贴出全部配置文件 ]
# 21行,作为Kafka的唯一标识,Slave1、Slave2要更改,后面
broker.id=1
# 31行,将注释放开,添加为本机[ Master ]IP,且Slave1,Slave2需要更改为“本机IP”或映射名称
listeners=PLAINTEXT://master:9092
# 添加host.name,其他机器需要改为“本机IP”或映射名称
# 此处的host.name为本机IP(重要),如果不改,则客户端会抛出:
# Producer connection to localhost:9092 unsuccessful 错误!
# 32行添加
host.name=master
# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
# 36行,打开注释,其他机器需要改为“本机IP”或映射名称
advertised.listeners=PLAINTEXT://master:9092
# 60行,Kafka的数据存盘位置,并不是log文件存放位置
log.dirs=/usr/apps/data/kafka-logs
# 65行,topic在当前broker上的分片个数,要求等于机器数量
num.partitions=3
# 74 ~ 76 行
# __consumer_offsets副本数量
offsets.topic.replication.factor=3
# 分区数
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
# 123行,zookeeper通信地址
zookeeper.connect=master:2181,slave1:2181,slave2:2181
  • 示例 [ 全部代码 ]
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://master:9092
host.name=master
# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://master:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/usr/apps/data/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=3

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=master:2181,slave1:2181,slave2:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
  1. 编辑zookeeper.properties
vim zookeeper.properties
  • 示例
# 存放zookeeper唯一标识“myid”文件
dataDir=/usr/apps/data/zk/zkdata
# 存放zookeeper的log日志
dataLogDir=/usr/apps/data/zk/zklog
# the port at which the clients will connect
# zookeeper的通信端口
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
# 最大连接数,注释掉
# maxClientCnxns=0
# CS通信心跳数(毫秒/ms)
tickTime=2000
# LF初始通信时限
# 集群中的follower服务器(F)与leader服务器(L)之间 初始连接 时能容忍的最多心跳数(tickTime的数量)。
initLimit=10
# LF同步通信时限
# 集群中的follower服务器(F)与leader服务器(L)之间 请求和应答 之间能容忍的最多心跳数(tickTime的数量)。
syncLimit=5
# 各个主机地址和端口
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
  1. 新建文件夹:zkdata、zklog
mkdir -p /usr/apps/data/zk/zkdata
mkdir -p /usr/apps/data/zk/zklog
  • 示例
[root@master config]# mkdir -p /usr/apps/data/zk/zkdata
[root@master config]# mkdir -p /usr/apps/data/zk/zklog
  1. 向文件输入内容:输入内容自动创建文件
mkdir -p /usr/apps/data/zk/zkdata
mkdir -p /usr/apps/data/zk/zklog
  • 示例
# 将“1”输入到“myid”文件中,该值(1)为唯一值,其他两台从机需要更改。
echo 1 > /usr/apps/data/zk/zkdata/myid

# 查看“myid”文件中的内容
[root@master config]# cat /usr/apps/data/zk/zkdata/myid
1



分发

分发环境变量文件

scp /etc/profile slave1:/etc/
scp /etc/profile slave2:/etc/
  • 示例
[root@master ~]# scp /etc/profile slave1:/etc/
profile                                                                                                                                                      100% 2319     2.3KB/s   00:00    
[root@master ~]# scp /etc/profile slave2:/etc/
profile                                                                                                                                                      100% 2319     2.3KB/s   00:00    

分发配置完毕的文件

  • 分发时间可能有点漫长,耐心等待。若出现需要输入密码,请重新配置免密!
scp -r /usr/apps/ slave1:/usr/
scp -r /usr/apps/ slave2:/usr/
  • 示例 [ 内容过多,不作详细展示 ]
[root@master ~]# scp -r /usr/apps/ slave1:/usr/
# 文件分发过程...
# 文件分发过程...
# 文件分发过程...
[root@master ~]# scp -r /usr/apps/ slave2:/usr/
# 文件分发过程...
# 文件分发过程...
# 文件分发过程...



修改从机配置文件

  1. 修改Kafka中server.properties配置文件
vim /usr/apps/kafka_2.11-2.0.0/config/server.properties
  • 示例 [ Slave1从机 ]
[root@slave1 ~]# vim /usr/apps/kafka_2.11-2.0.0/config/server.properties

# 更改文件如下
broker.id=2
listeners=PLAINTEXT://slave1:9092
host.name=slave1
advertised.listeners=PLAINTEXT://slave1:9092
  • 示例 [ Slave2从机 ]
[root@slave2 ~]# vim /usr/apps/kafka_2.11-2.0.0/config/server.properties

# 更改文件如下
broker.id=3
listeners=PLAINTEXT://slave2:9092
host.name=slave1
advertised.listeners=PLAINTEXT://slave2:9092
  1. 修改zookeeper的myid文件的值
vim /usr/apps/data/zk/zkdata/myid
  • 示例 [ Slave1从机 ]
[root@slave1 ~]# vim /usr/apps/data/zk/zkdata/myid
# 修改结果
[root@slave1 ~]# cat /usr/apps/data/zk/zkdata/myid 
2
  • 示例 [ Slave2从机 ]
[root@slave2 ~]# vim /usr/apps/data/zk/zkdata/myid
# 修改结果
[root@slave2 ~]# cat /usr/apps/data/zk/zkdata/myid 
3



启动各个软件

  • 操作之前请务必查看三台防火墙是否关闭!
# 查看防火墙状态
systemctl status firewalld
# 关闭防火墙
systemctl stop firewalld
# 开启防火墙
systemctl start firewalld

启动Hadoop集群

  1. 启动HDFS分布式文件系统
  • Master主机操作即可
  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2021-12-07 12:05:45  更:2021-12-07 12:09:18 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 -2025/1/17 13:57:19-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码