经常使用的脱敏规则 1.md5()函数加密,需要用hive,用hive容易报内存不足,set mapreduce.map.memory.mb=1024的倍数 2.保留第一位,后面用*代替
concat(strleft(coalesce(Cst_no ,''),1),regexp_replace(substr(coalesce(Cst_no ,''),2),'.','*'))
脱敏导数常用方法 1.导出格式化数据到文本文件,(有表头,字段间分割),可以直接浏览数据 ①创建sql脚本配合脱敏规则,拿出需要的字段 ②执行导数,调用sql脚本impala-shell -i data02.bigdata.com -d default -l --auth_creds_ok_in_clear -u hive --ldap_password_cmd="echo -n Ab123456" -f test.sql -B --output_delimiter="|" --print_header -o result.txt 注意:只有-B -o时会把表头去掉,这个我在存量客户回访的判断是否交易日执行的脚本中用到了,i=cat clkh.txt ,clkh.txt中要么是0,要么是1,用的就是impala导数。 -f执行脚本文件 -q执行"sql脚本" 2.生产拿到脱敏数据,并在测试环境建表查看脱敏后的数据 ①在生产环境建一张临时表,表结构和需要脱敏的表结构一样。 ②创建sql脚本配合脱敏规则,拿出需要的字段,insert overwrite table 临时表 partition(biz_date)可动态分区,可静态分区 ③执行sql脚本,则临时表里就是脱敏后的数据 ④从生产环境把hdfs的临时表数据拿下来在测试环境建表查看数据 举例
-------------------------------------------------------------check oldtable create table
[data02.bigdata.com:21000] > show create table test1;
Query: show create table test1
+-----------------------------------------------------------------+
| result |
+-----------------------------------------------------------------+
| CREATE TABLE ljh.test1 ( |
| id INT, |
| name STRING |
| ) |
| PARTITIONED BY ( |
| biz_date INT |
| ) |
| STORED AS TEXTFILE |
| LOCATION 'hdfs://nameservice1/user/hive/warehouse/ljh.db/test1' |
| TBLPROPERTIES ('numRows'='-1') |
+-----------------------------------------------------------------+
----------------------------------------------------check oldtable data
[data02.bigdata.com:21000] > select * from test1;
Query: select * from test1
Query submitted at: 2021-07-28 13:41:52 (Coordinator: http://data02.bigdata.com:25000)
Query progress can be monitored at: http://data02.bigdata.com:25000/query_plan?query_id=9740be28f4fee1e6:adf2ad1a00000000
+----+------+----------+
| id | name | biz_date |
+----+------+----------+
| 2 | lili | 20210708 |
| 1 | tom | 20210707 |
+----+------+----------+
------------------------------------------------get oldtable data to local
hadoop fs -get hdfs://nameservice1/user/hive/warehouse/ljh.db/test1 ./test1
----------------------------------------------------check local result
[test_user@data02 test1]$ ll
total 0
drwxrwxr-x 2 test_user test_user 65 Jul 28 13:43 biz_date=20210707
drwxrwxr-x 2 test_user test_user 66 Jul 28 13:43 biz_date=20210708
drwxrwxr-x 2 test_user test_user 6 Jul 28 13:43 _impala_insert_staging
-----------------------------------------------------create newtable hdfs path
[test_user@data02 test1]$ hadoop fs -mkdir hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
----------------------------------------------------put local data to newtable hdfs path (table dir/partitions dir)
[test_user@data02 test1]$ hadoop fs -put ./* hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
---------------------------------------------------check newtable hdfs result
[test_user@data02 test1]$ hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
Found 3 items
drwxr-xr-x - test_user test_user 0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/_impala_insert_staging
drwxr-xr-x - test_user test_user 0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/biz_date=20210707
drwxr-xr-x - test_user test_user 0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/biz_date=20210708
-----------------------------------------------------create newtable as oldtable(STORED must be same)
CREATE TABLE ljh.test0728(
id INT,
name STRING
)
PARTITIONED BY(
biz_date INT
)
STORED AS TEXTFILE
LOCATION'hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728'
TBLPROPERTIES('numRows'='-1')
-----------------------------------------------------relate to partitions
[data02.bigdata.com:21000] > alter table ljh.test0728 add partition(biz_date=20210707);
Query: alter table ljh.test0728 add partition(biz_date=20210707)
Fetched 0 row(s) in 0.05s
[data02.bigdata.com:21000] > alter table ljh.test0728 add partition(biz_date=20210708);
Query: alter table ljh.test0728 add partition(biz_date=20210708)
Fetched 0 row(s) in 0.03s
-------------------------------------------------------check newtable result
[data02.bigdata.com:21000] > select * from ljh.test0728;
Query: select * from ljh.test0728
Query submitted at: 2021-07-28 13:56:10 (Coordinator: http://data02.bigdata.com:25000)
Query progress can be monitored at: http://data02.bigdata.com:25000/query_plan?query_id=5b4ec77c3b04dad0:eb35d95d00000000
+----+------+----------+
| id | name | biz_date |
+----+------+----------+
| 1 | tom | 20210707 |
| 2 | lili | 20210708 |
+----+------+----------+
Fetched 2 row(s) in 0.11s
-------------------------------------------------------succeed-----------------------------
|