本章以Hadoop1.6.0为例:
1、解压 apache-flume-1.6.0-bin.tar.gz 至 /usr/local/src 并改名为flume-1.6.0(根据自己喜好)
tar -zxvf apache-flume-1.6.0-bin.tar.gz -C /usr/local/src
# 进入/usr/local/src目录
mv apache-flume-1.6.0-bin.tar.gz /usr/local/src/flume-1.6.0
2、修改/usr/local/src/flume-1.6.0/conf下的flume-env.sh
export JAVA_HOME=/usr/local/src/jdk1.8.0_291/(自己的路径)
3、配置环境变量 在root/.bash_profile末尾添加
export FLUME_HOME=/usr/local/src/flume-1.6.0
export PATH=$FLUME_HOME/bin:$PATH
#重新启用文件
source /root/.bash_profile
-
输入flume-ng version,输出以下信息表示安装成功
Flume 1.7.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
From source with checksum 0d21b3ffdc55a07e1d08875872c00523
4、flume将数据输出到hdfs必须持有hadoop相关jar包, 将如下jar包复制到flume/lib下。相关jar包如下:
# 这里Hadoop安装在/usr/local/src/下
/usr/local/src/hadoop-2.6.0/share/hadoop/common/lib/commons-configuration-1.6.jar
/usr/local/src/hadoop-2.6.0/share/hadoop/common/lib/hadoop-auth-2.6.0.jar
/usr/local/src/hadoop-2.6.0/share/hadoop/common/lib/commons-io-2.4.jar
/usr/local/src/hadoop-2.6.0/share/hadoop/common/lib/htrace-core-3.0.4.jar
/usr/local/src/hadoop-2.6.0/share/hadoop/common/lib/stax-api-1.0-2.jar
/usr/local/src/hadoop-2.6.0/share/hadoop/hadoop-common-2.6.0.jar
/usr/local/src/hadoop-2.6.0/share/hadoop/hdfs/hadoop-hdfs-2.6.0.jar
5、在/usr/local/src/flume-1.6.0下创建文件夹jobs,jobs下创建文件flume-file-hdfs.conf
mkdir jobs
cd jobs/
touch flume-file-hdfs.conf
6、在flume-file-hdfs.conf中添加如下内容
# Name the components on this agent 命名此agent上的组件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/congigure the source 描述/配置source代码
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /usr/local/src/hadoop-2.6.0/logs/hadoop-root-datanode-master.log
# Describe the sink 描述sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/%Y%m%d/%H
# 文件前缀
a1.sinks.k1.hdfs.filePrefix = logs-
# 是否按时间滚动文件夹
a1.sinks.k1.hdfs.round = true
# 多少时间单位创建一个新的文件夹
a1.sinks.k2.hdfs.roundValue = 1
# 取时间单位
a1.sinks.k1.hdfs.roundUnit = hour
# 是否使用本地时间戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# 积攒多少个Event才flush到HDFS一次
a1.sinks.k1.hdfs.batchSize = 1000
# 设置文件类型,可支持压缩
a1.sinks.k1.hdfs.fileType = DataStream
# 多久生成一个新文件
a1.sinks.k1.hdfs.rollInterval = 60
# 设置每个文件的滚动大小
a1.sinks.k1.hdfs.rollSize = 134217700
# 文件的滚动与Event的数量无关
a1.sinks.k1.hdfs.rollCount = 0
# Use a channnel which buffers events in memory 使用在memory中缓冲事件的通道
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel 把source和skin绑定到通道上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
7、启动flume(进入/usr/local/src/flume-1.6.0)(需要先启动Hadoop集群)
bin/flume-ng agent -c conf/ -f jobs/flume-file-hdfs.conf -n a1
8、查看上传日志
?这样就算搭建好了,有错误还望各位指出
|