[大数据] impala/hive脱敏导数方法，建表跟hdfs路径关联

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> impala/hive脱敏导数方法，建表跟hdfs路径关联 -> 正文阅读

[大数据]impala/hive脱敏导数方法，建表跟hdfs路径关联

经常使用的脱敏规则
1.md5()函数加密，需要用hive，用hive容易报内存不足，set mapreduce.map.memory.mb=1024的倍数
2.保留第一位，后面用*代替

concat(strleft(coalesce(Cst_no	          ,''),1),regexp_replace(substr(coalesce(Cst_no	          ,''),2),'.','*'))

脱敏导数常用方法
1.导出格式化数据到文本文件，（有表头，字段间分割），可以直接浏览数据
①创建sql脚本配合脱敏规则，拿出需要的字段
②执行导数，调用sql脚本impala-shell -i data02.bigdata.com -d default -l --auth_creds_ok_in_clear -u hive --ldap_password_cmd="echo -n Ab123456" -f test.sql -B --output_delimiter="|" --print_header -o result.txt
注意：只有-B -o时会把表头去掉，这个我在存量客户回访的判断是否交易日执行的脚本中用到了，i=cat clkh.txt，clkh.txt中要么是0，要么是1，用的就是impala导数。
-f执行脚本文件 -q执行"sql脚本"
2.生产拿到脱敏数据，并在测试环境建表查看脱敏后的数据
①在生产环境建一张临时表，表结构和需要脱敏的表结构一样。
②创建sql脚本配合脱敏规则，拿出需要的字段，insert overwrite table 临时表 partition(biz_date)可动态分区，可静态分区
③执行sql脚本,则临时表里就是脱敏后的数据
④从生产环境把hdfs的临时表数据拿下来在测试环境建表查看数据
举例

-------------------------------------------------------------check oldtable create table
[data02.bigdata.com:21000] > show create table test1;
Query: show create table test1
+-----------------------------------------------------------------+
| result                                                          |
+-----------------------------------------------------------------+
| CREATE TABLE ljh.test1 (                                        |
|   id INT,                                                       |
|   name STRING                                                   |
| )                                                               |
| PARTITIONED BY (                                                |
|   biz_date INT                                                  |
| )                                                               |
| STORED AS TEXTFILE                                              |
| LOCATION 'hdfs://nameservice1/user/hive/warehouse/ljh.db/test1' |
| TBLPROPERTIES ('numRows'='-1')                                  |
+-----------------------------------------------------------------+
----------------------------------------------------check oldtable data
[data02.bigdata.com:21000] > select * from test1;
Query: select * from test1
Query submitted at: 2021-07-28 13:41:52 (Coordinator: http://data02.bigdata.com:25000)
Query progress can be monitored at: http://data02.bigdata.com:25000/query_plan?query_id=9740be28f4fee1e6:adf2ad1a00000000
+----+------+----------+
| id | name | biz_date |
+----+------+----------+
| 2  | lili | 20210708 |
| 1  | tom  | 20210707 |
+----+------+----------+
------------------------------------------------get oldtable data to local
hadoop fs -get hdfs://nameservice1/user/hive/warehouse/ljh.db/test1 ./test1
----------------------------------------------------check local result
[test_user@data02 test1]$ ll
total 0
drwxrwxr-x 2 test_user test_user 65 Jul 28 13:43 biz_date=20210707
drwxrwxr-x 2 test_user test_user 66 Jul 28 13:43 biz_date=20210708
drwxrwxr-x 2 test_user test_user  6 Jul 28 13:43 _impala_insert_staging
-----------------------------------------------------create newtable hdfs path
[test_user@data02 test1]$ hadoop fs -mkdir hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
----------------------------------------------------put local data to newtable hdfs path (table dir/partitions dir)
[test_user@data02 test1]$ hadoop fs -put ./* hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
---------------------------------------------------check newtable hdfs result
[test_user@data02 test1]$ hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
Found 3 items
drwxr-xr-x   - test_user test_user          0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/_impala_insert_staging
drwxr-xr-x   - test_user test_user          0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/biz_date=20210707
drwxr-xr-x   - test_user test_user          0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/biz_date=20210708
-----------------------------------------------------create newtable as oldtable(STORED must be same)
CREATE TABLE ljh.test0728(
id INT,
name STRING
)
PARTITIONED BY(
biz_date INT
)
STORED AS TEXTFILE
LOCATION'hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728'
TBLPROPERTIES('numRows'='-1')
-----------------------------------------------------relate to partitions
[data02.bigdata.com:21000] > alter table ljh.test0728 add partition(biz_date=20210707);
Query: alter table ljh.test0728 add partition(biz_date=20210707)
Fetched 0 row(s) in 0.05s
[data02.bigdata.com:21000] > alter table ljh.test0728 add partition(biz_date=20210708);
Query: alter table ljh.test0728 add partition(biz_date=20210708)
Fetched 0 row(s) in 0.03s
-------------------------------------------------------check newtable result
[data02.bigdata.com:21000] > select * from ljh.test0728;
Query: select * from ljh.test0728
Query submitted at: 2021-07-28 13:56:10 (Coordinator: http://data02.bigdata.com:25000)
Query progress can be monitored at: http://data02.bigdata.com:25000/query_plan?query_id=5b4ec77c3b04dad0:eb35d95d00000000
+----+------+----------+
| id | name | biz_date |
+----+------+----------+
| 1  | tom  | 20210707 |
| 2  | lili | 20210708 |
+----+------+----------+
Fetched 2 row(s) in 0.11s

-------------------------------------------------------succeed-----------------------------

大数据最新文章

实现Kafka至少消费一次

亚马逊云科技：还在苦于ETL？Zero ETL的时代

初探MapReduce

【SpringBoot框架篇】32.基于注解+redis实现

Elasticsearch：如何减少 Elasticsearch 集

Go redis操作

Redis面试题

专题五 Redis高并发场景

基于GBase8s和Calcite的多数据源查询

Redis——底层数据结构原理

加:2021-07-29 11:43:03 更:2021-07-29 11:45:15

360图书馆购物三丰科技阅读网日历万年历 2026年5日历

-2026/5/1 1:22:40-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码