[大数据] 2021-08-03

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> 2021-08-03 -> 正文阅读

[大数据]2021-08-03

杂项备份
concat:
concat函数在连接字符串的时候，只要其中一个是NULL，那么将返回NULL
select concat(‘a’,‘b’,null); null
select concat(‘a’,‘b’);ab

concat_ws函数在连接字符串的时候，只要有一个字符串不是NULL，就不会返回NULL。concat_ws函数需要指定分隔符。
select concat_ws(’-’,‘a’,‘b’,null); a-b
阿里的dataphin 就是下边的结果,所以 concat_ws 慎用
select concat_ws(’;’,null,‘hello’);null
select concat_ws(’;’,null,‘hello’,‘world’);null
select concat_ws(’;’,‘hello’,‘world’,null);null

CREATE TABLE test_guoc(id int , alphabet string);
INSERT INTO test_guoc VALUES (1,‘a’),(1,‘b’),(1,‘c’),(2,‘D’),(2,‘E’),(2,‘F’);
SELECT id,wm_concat(’’,alphabet) FROM test_guoc GROUP BY id ORDER BY id LIMIT 100; 1 abc 2 def
INSERT INTO test_guoc VALUES (1,null),(2,’’);
SELECT id,wm_concat(’’,alphabet) FROM test_guoc GROUP BY id ORDER BY id LIMIT 100; 1 abc 2 DEF
select * from test_guoc;

select lengthb(‘测试语句长度’) --18
select length(‘测试语句长度’) --6
select length(‘hello’) --5
select lengthb(‘hello’) --5

case when:
　　　　select
　　　　case 　　job_level
　　　　when ‘1’ then ‘1111’
　　　　when　 ‘2’ then ‘1111’
　　　　when　 ‘3’ then ‘1111’
　　　　else ‘eee’ end
　　　　from dbo.employee;

update employee
　　　　set e_wage =
　　　　case
　　　　when job_level = ‘1’ then e_wage1.97
　　　　when job_level = ‘2’ then e_wage1.07
　　　　when job_level = ‘3’ then e_wage1.06
　　　　else e_wage1.05
　　　　end as 自定义列明

case when 中如果没有被选中的
select case area_number when 130000 then area_number when 130100 then 2222222 end from kylin.KYLIN_AREA_HIERARCHY

hive 中‘不等于’不管是用！或者<>符号实现，都会将空值即null过滤掉，此时要用
where （white_level<>‘3’ or white_level is null）
或者 where (white_level!=‘3’ or white_level is null ) 来保留null 的情况。

collct_list collect_set
select
n2.order_sn as order_id,
collect_set(
concat_ws(’_’, n3.item_for_people_name, n3.item_dan_id)
) as product_for_prople_dan
from
firmus_dataphin_prd_ods.s_feihe_sdc_core_order_product n2 group by n2.orser_sn
如果上表作为子表进行select distinct 查询时,是报错的可以用concat_ws 将array 换为字符串
collect_set 去重,collect_list 不去重
select
a1.member_id as member_id,
concat_ws(’—>’,
sort_array(
collect_list(
concat_ws(’:’, to_char(a1.buy_time,‘yyyyMMdd’), a2.product_for_prople_dan)
)
)) as near_order_info
from 表 group by 表.member_id

if if（条件表达式，结果1，结果2）相当于java中的三目运算符,只是if后面的表达式类型可以不一样。
select if(null,1,2) 2
select if(null < 3000,‘hello’,‘world’) world
select if(null or 1 = 1 ,22,33) 22
select if(a=a,’bbbb’,111) fromlxw_dual;
bbbb
select if(1<2,100,200) fromlxw_dual;
200

select datetrunc(getdate(), ‘dd’) 2020-03-30 00:00:00
select getdate() 2020-03-30 13:40:08
dataphin ‘yyyymmdd hh-mi-ss.ff3’
select datediff(to_date(),to_date(),‘mi’)
to_char(getdate(),‘yyyymmdd’)
select to_char(getdate(),‘yyyymmdd’) 20200330
select to_char(getdate(),‘yyyymm’) 202004
select dateadd(getdate(),2,‘mm’) 2020-05-30 13:43:57
datepart(pay_time, ‘hh’) as pay_time_hour
‘20200301’ <= to_char(buy_time,‘yyyymmdd’)
to_date(‘20011201’,‘yyyymmdd’)
select to_date(‘202009’, ‘yyyymm’) 2020-09-01 00:00:00
dateadd(t1.buy_time,-1,‘mm’) --时间减去值

select datediff(to_date(‘20200220’,‘yyyymmdd’),to_date(‘20190210’,‘yyyymmdd’),‘mm’) --12
select datediff(to_date(‘20200220’,‘yyyymmdd’),to_date(‘20190225’,‘yyyymmdd’),‘mm’) --12
select datediff(to_date(‘20200220’,‘yyyymmdd’),to_date(‘20190220’,‘yyyymmdd’),‘mm’) --12

select instr(‘asdf’,‘as’) 1
select instr(‘asdfas’,‘as’) 1
select instr(‘aasdfas’,‘as’) 2
select instr(‘asdf’,‘aw’) 0
select instr(null,‘as’) null
select instr(‘asdf’,null) null
select instr(null,null) null

count() over(partition by 字段 )
select count() over(partition by col1) from tmp_test;

not like ‘%A%’ 是不能查出NULL的记录的！
sql 中除了 is null 和 not null 以外,只要出现null 都是false;

hive 常用日期格式转换
固定日期转换成时间戳
select unix_timestamp(‘2016-08-16’,‘yyyy-MM-dd’) --1471276800 --这个函数一定要注意转换到秒,前端可能是转换到毫秒,需要乘1000
select unix_timestamp(‘20160816’,‘yyyyMMdd’) --1471276800
select unix_timestamp(‘2016-08-16T10:02:41Z’, “yyyy-MM-dd’T’HH:mm:ss’Z’”) --1471312961
16/Mar/2017:12:25:01 +0800 转成正常格式（yyyy-MM-dd hh:mm:ss）
select from_unixtime(to_unix_timestamp(‘16/Mar/2017:12:25:01 +0800’, ‘dd/MMM/yyy:HH:mm:ss Z’))
时间戳转换程固定日期
select from_unixtime(1471276800,‘yyyy-MM-dd’) --2016-08-16
select from_unixtime(1471276800,‘yyyyMMdd’) --20160816
select from_unixtime(1471312961) – 2016-08-16 10:02:41
select from_unixtime( unix_timestamp(‘20160816’,‘yyyyMMdd’),‘yyyy-MM-dd’) --2016-08-16
select date_format(‘2016-08-16’,‘yyyyMMdd’) --20160816
返回日期时间字段中的日期部分
select to_date(‘2016-08-16 10:03:01’) --2016-08-16
取当前时间
select from_unixtime(unix_timestamp(),‘yyyy-MM-dd HH:mm:ss’)
select from_unixtime(unix_timestamp(),‘yyyy-MM-dd’)
返回日期中的年
select year(‘2016-08-16 10:03:01’) --2016
返回日期中的月
select month(‘2016-08-16 10:03:01’) --8
返回日期中的日
select day(‘2016-08-16 10:03:01’) --16
返回日期中的时
select hour(‘2016-08-16 10:03:01’) --10
返回日期中的分
select minute(‘2016-08-16 10:03:01’) --3
返回日期中的秒
select second(‘2016-08-16 10:03:01’) --1
返回日期在当前的周数
select weekofyear(‘2016-08-16 10:03:01’) --33
返回结束日期减去开始日期的天数
select datediff(‘2016-08-16’,‘2016-08-11’)
返回开始日期startdate增加days天后的日期
select date_add(‘2016-08-16’,10)
返回开始日期startdate减少days天后的日期
select date_sub(‘2016-08-16’,10)
返回当天三种方式
SELECT CURRENT_DATE;
–2017-06-15
SELECT CURRENT_TIMESTAMP;–返回时分秒
–2017-06-15 19:54:44
SELECT from_unixtime(unix_timestamp());
–2017-06-15 19:55:04
返回当前时间戳
Select current_timestamp–2018-06-18 10:37:53.278
返回当月的第一天
select trunc(‘2016-08-16’,‘MM’) --2016-08-01
返回当年的第一天
select trunc(‘2016-08-16’,‘YEAR’) --2016-01-01

alter table s_etms_feihe_t_etms_mt_meeting_delta add columns (apply_oa STRING comment ‘是否OA特殊申请’)
alter table s_etms_feihe_t_etms_mt_meeting_delta add columns (train_team_id STRING comment ‘培训团队id’)

alter table s_etms_feihe_t_etms_mt_meeting add columns (apply_oa STRING comment ‘是否OA特殊申请’)
alter table s_etms_feihe_t_etms_mt_meeting add columns (train_team_id STRING comment ‘培训团队id’)

nvl(null,2) coalesce(v0,v1,v2)
sum avg 一定要注意,遇null 都变为null,所以最好用coalesce 函数
sum(s1.this_year_month_count) over (partition by s1.flag order by year_month) as current_sum_number
nvl(sum(s1.this_year_month_count) over (partition by s1.flag order by year_month rows between unbounded preceding and 1 preceding),0) as huanbi_sum_number
nvl(sum(s1.this_year_month_count) over (partition by s1.flag order by year_month rows between unbounded preceding and 12 preceding),0) as tonhgbi_sum_number

–判断字段是否包含某个值
–判断 name 字段中是否包含字符串 “aaa”：
select * from temp where locate(“aaa”, name) > 0;

select split_part(‘abcd’,‘c’,2) d
select split_part(‘abcd’,‘c’,1) ab
select split_part(‘abcd’,‘a’,1) --Kong
select split_part(‘abcdce’,‘c’,2) d
select split_part(‘abcd’,‘c’,4) --kong 也不是null
select split_part(‘abcd’,‘h’,2) --kong 也不是null
select coalesce(split_part(‘abcd’,‘h’,2),233) --kong 不是233
select if(split_part(‘abcd’,‘h’,2)==’’,“结果是’’”,233) --结果是’’

–忘了把这个函数,我试啦,限制太多
SELECT find_in_set(1,‘1,2,3,6’) --1
select find_in_set(‘c’,‘a,b,c,d’) --3
select find_in_set(‘f’,‘a,b,c,d’) --0
select find_in_set(’’,‘a,b,c,d’) --0
select find_in_set(’’,’,b,c,d’) --0
select find_in_set(null,‘a,b,c,d’) --null
select find_in_set(null,null) --null
select find_in_set(‘c’,null) --null

row_number() over(partition by n1.member_id order by n1.order_at) as rn
first_value(train_team_name) over(partition by meeting_date order by meeting_id desc) --我发现 last_value() over 好像存在问题,以后暂时不要使用啦
– 窗口函数在where条件后执行 --可以发现下边的函数 rn 从1开始
select
n1.bvdid
,row_number() over(order by n1.bvdid) as rn
from ods_orbis.ods_guoc_test_01 n1
where n1.bvdid >= ‘2323’ and n1.bvdid <= ‘6464’

– 求差集
select id from t1 except select id from t2

ds=case
when substr( ${bizdate},7,2)>'10' and substr($ {bizdate},7,2)<=‘29’ then concat(substr(${bizdate},1,6),‘10’)
else ${bizdate}
end

select cast(date as char)

SELECT * FROM t_1 UNION SELECT * FROM t_2 去重是对所有的结果集去重,原来的表重复也会去重

SELECT * FROM t_1 UNION all SELECT * FROM t_2 不去重

Select decode（columnname，值1,翻译值1,值2,翻译值2,…值n,翻译值n,缺省值）From talbename Where … --decode(里边是基数个)
sign()函数根据某个值是0、正数还是负数，分别返回0、1、-1，用如下的SQL语句取较小值： --decode(里边是偶数个)
select monthid,decode(sign(sale-6000),-1,sale,6000) from output;
select decode(null,1,1,2) 2

select size(split(‘hello,world,spark’,’,’)) --3`
select size(split(’ ‘,’,’)) --1
select size(split(’’,’,’)) --0
select size(split(null,’,’)) --null

select ds, count(*) from ${firmus_dataphin_prd_ods}.s_etms_feihe_t_etms_mt_meeting where ds<>’’ group by ds

– replace 和 regexp_replace 对字符串"XXX" 进行操作,如果"XXX" 中含有 ‘’ 字符,默认替换为’’,暂时无法解决这个问题
– ‘\s’是对空格和换行等空白符进行替换
select regexp_replace(‘hello world gc 23’,‘l’,‘m’)
select length(regexp_replace(‘hello world gc 23’,’[0-9]’,’’)) > 0
select regexp_replace(‘A B C’,’\s+’,’@’) – A@B@C
select regexp_replace(‘A B C’,’\s’,’@’) – A@B@@@C
select regexp_replace(‘A B C
D’,’\s’,’@’) – A@B@@@C@@D
regexp_replace(ret.pay_remark, ‘\n|\t|\r’, ‘’)

select replace(‘hello world gc 23’,‘l’,‘m’)

select split(‘abcdef’, ‘c’)[0]
select split(‘aa.bb’,’\.’)[1]

– 所有分区,匹配相同活动,总结状态有变化的活动id,可以看看上有给的数据有没有问题,这个因为涉及到增量问题,所有有的时候数据更新不及时
select * from (select meeting_id,count(distinct status) cnt from ${firmus_dataphin_prd_ods}.s_feihe_scrm_fh_meeting_summary where ds <>’’ group by meeting_id) t88 where t88.cnt>1

select ${aaa} = 2 or ${aaa} is null
select ${aaa} = 2

[{“name”:“王二狗”,“sex”:“男”,“age”:“25”},{“name”:“李狗嗨”,“sex”:“男”,“age”:“47”}]
SELECT get_json_object(xjson," $0]") FROM person; {"name":"王二狗","sex":"男","age":"25"} SELECT get_json_object(xjson,"$ .[0].age") FROM person;25

ARG_MAX(valueToMaximize, valueToReturn) 返回valueToMaximize最大值对应行的valueToReturn

Structured Query Language
=VLOOKUP(B3,第一时间分区!B:K,5,FALSE)
ctrl + c
ctrl + shift + 下键然后 ctrl + v

批量注释快捷键
ctrl + /
ctrl + ;
NotePad++转换大小写;
1.小写转换大写 Ctrl + Shift + U
2.大写转换小写 Ctrl + U

dataphin 中类似于notepad++ 的alt + ctrl 操作键是 alt + shift

是delete的时候,后边的where条件不支持,mysql(ADB里的ads)中不支持if,可以用case when 代替,不支持substr 可以用substring代替

num lock 键关闭的时候
home键快速到行首
end键快速到行尾
num lock 键开启的时候
shift + home键快速到行首
shift + end键快速到行尾

shift + 上下键可以试着选中一行

–******************************************************************************
–所属主题: 会员场景
–功能描述: 根据订单（门店订单、星妈优选订单、电商订单、自主扫码订单）判断会员婴配粉新客订单、婴配粉老客订单，根据婴配粉新、老客订单取出对应商品段位、品项、订单来源、创建时间等字段
– 婴配粉新客订单：会员首单是婴配粉订单
– 婴配粉老客订单：会员第二笔婴配粉订单（备注：自主扫码订单判断婴配粉老客订单，如果婴配粉新客订单和婴配粉老客订单不能同一天）
– 复购前是否兑换星妈优选：根据星妈优选有效订单时间、与新客订单时间、老客订单时间进行判断
– 婴配粉（0-4段奶粉）
–创建者 : 樊克
–创建日期: 20190511
–修改日期修改人修改内容
–20200511 樊克 init
–20200515 樊克添加新客来源二级来源，修改段位枚举值(修改为孕婴粉、1段…)
–******************************************************************************

–如何查询用order_id 还是外部订单下边这个为空
select * from ${firmus_dataphin_prd_ods}.s_pefeihe_datamodel_plt_fh_birthphoto where ds=‘20200531’ and out_order_code not in (select other_order_on from ${firmus_dataphin_prd_ods}.s_pefeihe_datamodel_plt_fh_order_buyer where ds=‘20200531’)

select * from ${firmus_dataphin_prd_ods}.s_feihe_sdc_core_order_info where ds=‘20200531’ and other_order_sn not in (select other_order_on from ${firmus_dataphin_prd_ods}.s_pefeihe_datamodel_plt_fh_order_buyer where ds=‘20200531’)

select t1.order_sn ,t1.other_order_sn,t2.other_order_on as t2_other_order_on from
${firmus_dataphin_prd_ods}.s_feihe_sdc_core_order_info t1
left join (select other_order_on from ${firmus_dataphin_prd_ods}.s_pefeihe_datamodel_plt_fh_order_buyer where ds=‘20200531’) t2 on t2.other_order_on = t1.other_order_sn
where ds=‘20200531’ and t2.other_order_on is null

– ALTER TABLE ${firmus_dataphin_prd_ads}.ads_bi_offline_order_detail_1d_df ADD COLUMNs ( coupon_newact_type STRING COMMENT ‘优惠券新客活动类型’)
– ALTER TABLE ads_bi_offline_order_detail_1d_df ADD COLUMNs ( coupon_newact_type STRING COMMENT ‘优惠券新客活动类型’)
– alter table ${firmus_dataphin_prd_ads}.ads_bi_offline_order_detail_1d_df CHANGE COLUMN heli_gift_item_name product_name string comment ‘商品名称’;
– alter table ads_bi_offline_order_detail_1d_df CHANGE COLUMN heli_gift_item_name product_name string comment ‘商品名称’;

– linux 操作 ~/guoctest

Linux下查看某个文件数据(hive的表)可以先用show create table table_name 查到存储路径
hadoop fs -ls hdfs://master1:8020/warehouse/tablespace/external/hive/exp.db/exp_app_jg_jgxw_check_action
外部表权限 drwxrwxrwx+ - hive hadoop
show create table exp.exp_app_jg_jgxw_check_action ;
show create table app.app_jg_jgxw_check_action ;
hadoop fs -ls hdfs://master1:8020/warehouse/tablespace/managed/hive/app.db/app_jg_jgxw_check_action
drwxrwx—+ - hive hadoop
Linux下查看某个表的路径
find / -name *sharemanaged/hive/app.db -type d -print
Linux下查看某个表的大小
hdfs dfs -du -s -h hdfs://master1:8020/warehouse/tablespace/managed/hive/app.db/app_jg_jgxw_check_action
–查看entity库里所有表的数据量汇总大小
hdfs dfs -du -s -h /user/hive/warehouse/entity.db/
–查看entity库里每个表的数据量的大小
hdfs dfs -du -s -h /user/hive/warehouse/entity.db/
hdfs dfs -du -h /hbase/data/default
–查看hive的数据量
hdfs dfs -du -s -h /user/hive
–改变集群的副本数量(/user/hive目录下的集群副本为两个)
hdfs dfs -setrep -R 2 /user/hive

hadoop fs -get /user/hive/warehouse/entity.db/entity_basicinfo_add/ /bvddata/test/
hadoop fs put -f abc.txt /user/hive/warehouse/entity.db/entity_basicinfo_add/
scp -r 10.241.130.24:/bvddata/guoc_test/entity_basicinfo_add/ /bvddata/202103/
scp -r qiye@10.241.130.24:/bvddata/guoc_test/entity_basicinfo_add/ /bvddata/202103/

Linux下查看HDFS的文件路径
hadoop fs -ls hdfs://hdfs-ha/warehouse/tablespace/
hadoop fs -ls hdfs://master1:8020/warehouse/tablespace/
hadoop fd -ls /user
查看某行都包含了那些字符
echo abc | od -c
解压某个文件
unzip hello.txt
mv hello.txt /home/etl/world.txt
– 拥有者,群组,其他组
chmod 777 hello.txt
– 赋予脚本执行权限
chmod +x test.sh
su ynzw

7.查看每个文件夹占多少内存
du -h -d 1
df
– 查看当前文件夹下每个文件大小 du -sh -m *
– 查看当前文件夹下所有rar文件大小 du -sh -m *.rar
– 查看linux内存大小 free -g

8.kill hive 执行中的job任务
hadoop job -list来列出当前hadoop正在执行的jobs
hadoop job -kill job_1546932571227_0082来杀死该job任务
– 查看后台某个进程,kill掉(第二个进程 12029 是grep查询进程)
$ ps -ef | grep firefox
smx 1827 1 4 11:38 ? 00:27:33 /usr/lib/firefox-3.6.18/firefox-bin
smx 12029 1824 0 21:54 pts/0 00:00:00 grep --color=auto firefox

$kill -s 9 1827
–和狂神学的
ps -aux | grep mysql
kill -9 进程的id
pstree -pu

9.在linux中获取某个表有多少行数据
temp_str=echo beeline -e "select count(*) from ori_orbis.ori_additional_company_info"
count=echo ${temp_str} | awk -F " " '{print $(NF-2)}'

10.时间操作
date -d “20150416 12 3 hour” +"%Y%m%d%H"
2015041615
date -d “20150416 12 -1 hour” +"%Y%m%d%H"
2015041611
date -d"20210301 -1 month" +"%Y%m%d"
20210201

11.nohup
–nohup 英文全称 no hang up（不挂起），用于在系统后台不挂断地运行命令，退出终端不会影响程序的运行
–nohup 命令，在默认情况下（非重定向时），会输出一个名叫 nohup.out 的文件到当前目录下，如果当前目录的 nohup.out 文件不可写，输出重定向到 $HOME/nohup.out 文件中
–&：让命令在后台执行，终端退出后命令仍旧执行
–2>&1 解释：将标准错误 2 重定向到标准输出 &1 ，标准输出 &1 再被重定向输入到 runoob.log 文件中
–在后台执行 runoob.sh 脚本，并重定向输入到 runoob.log 文件：
nohup sh test40.sh > runoob.log 2>&1 &

12.查看防火墙开的端口信息
firewall-cmd --zone=public --list-all

13.linux脚本加载环境变量 – 在信保用crontab 通过脚本的sqlplus命令链接oracle不成功,发现是没有加载环境变量
source ~/.bash_profile
source /etc/profile

14.定时调度 crontab
0 0/10 * * * 与 0 */10 * * * 的差别在于什么地方。
在说这两者的差别之前，先说下各个字符代表的含义。0代表从0分开始，*代表任意字符，／代表递增。
也就是说0 0/10 * * *代表从0分钟开始，每10分钟执行任务一次。0 */10 * * *代表从任务启动开始每10分钟执行任务一次。有人会问，这不是一样的么？
答案是不一样的。因为起始的时间不一样。例如：从5:07分钟的时候执行该任务第一种写法会在5:10的时候进行执行，写法二会在5:17进行执行。这就是两者的差别。
当然0 0/1 * * * 与0 */1 * * 有时会被认为是同一种写法。
测试:/15 * * * * 刚开始是9.40开始的调度,后来的周期是:9.55,10.00,10.15,10.30…

– hive 概念
1.hadoop中reducer和reduce的区别
Reduce task其实是一个运行在node上，且执行Redcuer类中reduce函数的程序。可以把Reduce task当作是Reducer的一个实例。
从代码层面中，Mapper和Reducer是两个类。
在MapReduce的任务调度中， Mapper和Reducer分别是数据处理的第一阶段和第二阶段。
还有一种理解是，mapper和reducer可以看作是一个计算资源的slot，它可以被用作完成许多map task或者reduce task。它们可以被安排task，完成之后，又可以处理新的task

– hive 操作
1.查看hive所有分区
show partitions t_test_order;

2.查看表的存储格式
show create table table_name

3.内外部表转换
内部表——>外部表：alter table xxx set tblproperties(‘external’=‘true’)
外部表——>内部表：alter table xxx set tblproperties(‘external’=‘false’)

4.仅删除表中数据，保留表结构
truncate table sdfasdfasdfasdfa;

– hive配置
1.设置每个reducer的大小,默认是1G,输入文件如果是10G，那么就会起10个reducer,34359738368是2的19次方
–hiveconf hive.exec.reducers.bytes.per.reducer=3435973836

2.ORC格式(列式存储)的hive表在MR时会报未知原因的错，比如说error evaluating 某个字段值。原因是CDH的bug;我看建行的数据存储格式是 store as orc;
解决办法:
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;

3.开启动态分区
#设置参数开启动态分区（dynamic partition）
sethive.exec.dynamic.partition.mode=nonstrict; #默认为strict
sethive.exec.dynamic.partition=true; #默认为false

4.在hive界面查看某些配置信息
set mapreduce.reduce.tasks;
set hive.enforce.bucketing;

5.hive建表的时候支持事务;
首先目前的存储格式只有ORC且是分桶表的可以支持事务;
其次建表语句stored As ORC TBLPROPERTIES(‘transactional’=‘true’)

6.hive建表语句
create table mytest_tmp like FDM_SOR.mytest_deptaddr;
CREATE TABLE kylin.kylin_gov_org_line_info(
uniscid string COMMENT ‘’,
dir_guid_code string COMMENT ‘’,
name string COMMENT ‘’
)
comment ‘贴源层-政府信息’
partitioned by (ds STRING COMMENT ‘数据存储分区，格式yyyymmdd’)
row format delimited fields terminated by ‘\001’
stored As ORC
TBLPROPERTIES(‘transactional’=‘false’);

6.hive一些配置
set;
set system:user.timezone=Asia/Shanghai;

– 计算机基础
1.UTF-8 编码
一个汉字占三个字节,一个英文占两个字节,一个数字占一个字节,一个换行符占一个字节
一个汉字,一个英文,一个数字,一个换行符都只占一个字符;

18.04

– 力扣