项目数据分层 ODS DW DM
ODS
ODS 全称是 Operational Data Store,操作数据存储.“面向主题的”,数据运营层,也叫ODS层,是最接近数据源中数据的一层,数据源中的数据,经过抽取、洗净、传输,也就说传说中的 ETL 之后,装入本层。本层的数据,总体上大多是按照源头业务系统的分类方式而分类的。 撰写脚本 执行数据全量导入 MySQL ===》Hive 适合之前表数据可能存在修改操作
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table user \
--delete-target-dir \
--target-dir /snbap/ods/ods_user \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table cart \
--delete-target-dir \
--target-dir /snbap/ods/ods_cart \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table code_brand \
--delete-target-dir \
--target-dir /snbap/ods/ods_code_brand \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table code_category \
--delete-target-dir \
--target-dir /snbap/ods/ods_code_category \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table code_city \
--delete-target-dir \
--target-dir /snbap/ods/ods_code_city \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table code_education \
--delete-target-dir \
--target-dir /snbap/ods/ods_code_education \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table code_email_suffix \
--delete-target-dir \
--target-dir /snbap/ods/ods_code_email_suffix \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table code_goods \
-m 1 \
--delete-target-dir \
--target-dir /snbap/ods/ods_code_goods \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table code_profession \
--delete-target-dir \
--target-dir /snbap/ods/ods_code_profession \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table user_addr \
--delete-target-dir \
--target-dir /snbap/ods/ods_user_addr \
--fields-terminated-by '\001'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table user_extend \
--delete-target-dir \
--target-dir /snbap/ods/ods_user_extend \
--fields-terminated-by '\001'
观察可知大部分代码都相似,可以写一个循环,简化脚本。
list="cart code_brand code_category code_city code_education code_email_suffix code_goods code_profession user user_addr user_extend"
for tablename in $list;do
echo '正在创建bap_${tablename}'
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table ${tablename} \
--delete-target-dir \
--target-dir /snbap/ods/ods_${tablename} \
--fields-terminated-by '\001'
done
撰写脚本 执行数据增量导入 MySQL ===》Hive 适合之前表数据不会修改 此脚本若传入参数则其表分区为传入参数,不传参数则为当前日期的前一天。
args=$1
dt=
if [ ${
then
dt=`date -d '1 days ago' +%Y%m%d`
else
dt=$1
fi
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table user_order \
--target-dir /snbap/ods_tmp/ods_user_order \
--fields-terminated-by '\001' \
--check-column order_id \
--incremental append \
--last-value 0
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table order_delivery \
--target-dir /snbap/ods_tmp/ods_order_delivery \
--fields-terminated-by '\001' \
--check-column order_id \
--incremental append \
--last-value 0
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table order_item \
--target-dir /snbap//ods_tmp/ods_order_item \
--fields-terminated-by '\001' \
--check-column order_id \
--incremental append \
--last-value 0
sqoop import \
--connect jdbc:mysql://192.168.64.188:3306/snbap_ods \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table biz_trade \
--target-dir /snbap/ods_tmp/ods_biz_trade \
--fields-terminated-by '\001' \
--check-column order_id \
--incremental append \
--last-value 0
echo "执行sqoop语句完成"
echo "加载数据到分区表"
hive -e "load data inpath '/snbap/ods_tmp/ods_user_order/*' into table snbap_ods.ods_user_order partition(dt='$dt')"
hive -e "load data inpath '/snbap/ods_tmp/ods_order_delivery/*' into table snbap_ods.ods_order_delivery partition(dt='$dt')"
hive -e "load data inpath '/snbap/ods_tmp/ods_order_item/*' into table snbap_ods.ods_order_item partition(dt='$dt')"
hive -e "load data inpath '/snbap/ods_tmp/ods_biz_trade/*' into table snbap_ods.ods_biz_trade partition(dt='$dt')"
撰写Flume配置文件 监控目标文件夹 ===》Hdfs
a1.sources = s1
a1.channels = c1
a1.sinks = k1
a1.sources.s1.type=spooldir
a1.sources.s1.spoolDir = /root/project/snbap/flume/data/pc
a1.sources.s1.fileHeader =false
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=10000
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=/snbap/ods/user_pc_click_log/dt=%Y%m%d
a1.sinks.k1.hdfs.filePrefix=pc-
a1.sinks.k1.hdfs.rollInterval=30
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100000
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.fileType=DataStream
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1
a1.sources = s1
a1.channels = c1
a1.sinks = k1
a1.sources.s1.type=spooldir
a1.sources.s1.spoolDir = /root/project/snbap/flume/data/app
a1.sources.s1.fileHeader =false
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=10000
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=/snbap/ods/user_app_click_log/dt=%Y%m%d
a1.sinks.k1.hdfs.filePrefix=app-
a1.sinks.k1.hdfs.rollInterval=30
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollCount=100000
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.fileType=DataStream
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1
flume-ng agent -name a1 -c /opt/software/flume/flume190/conf/ -f /root/project/snbap/flume/conf/log1.conf -Dflume.root.logger=INFO,console
如果有日志表有分区,需要修复分区,如:
alter table ods_user_pc_click_log add partition(dt='20210805');
msck repair table ods_user_app_click_log;
DW
数据仓库层(DW),是数据仓库的主体.在这里,从 ODS 层中获得的数据按照主题建立各种数据模型。这一层和维度建模会有比较深的联系
DWD
DWS
DM
数据报表层,这一层是提供为数据产品使用的结果数据
|