一起重新开始学大数据-hadoop篇-day 51 数据类型 、DDL、DML |
SQL的小贴士(tips):
1、count(*)、count(1) 、count(‘字段名’) 区别
??count(*):所有行进行统计,包括NULL行 ??count(1):所有行进行统计,包括NULL行 ??count(‘字段名’):对字段中非Null进行统计 结果差距不大但是受到集群影响执行时间有差异, ? 执行时间: count(*)>count('字段名')count(1)
2、HQL 执行优先级:
from、where、 group by 、having、order by、join、select 、limit
3、where 条件里不支持不等式子查询,实际上是支持 in、not in、exists、not exists
select t1.EMPNO
,t1.ENAME
,t1.JOB
from emp t1
where t1.ENAME != "SCOTT" and t1.job in(
select job
from emp
where ENAME = "SCOTT");
7900,JAMES,CLERK,7698,1981-12-03,950,null,30
7902,FORD,ANALYST,7566,1981-12-03,3000,null,20
select t1.EMPNO
,t1.ENAME
,t1.JOB
from emp t1
where t1.ENAME != "SCOTT" and exists(
select job
from emp t2
where ENAME = "SCOTT"
and t1.job = t2.job
);
4、hive中大小写不敏感
5、在hive中,数据中如果有null字符串,加载到表中的时候会变成 null (不是字符串)
? 如果需要判断 null,使用 某个字段名 is null 这样的方式来判断
?或者使用 nvl() 函数,不能 直接 某个字段名 == null
6、使用explain查看SQL执行计划
explain select t1.EMPNO
,t1.ENAME
,t1.JOB
from emp t1
where t1.ENAME != "SCOTT" and t1.job in(
select job
from emp
where ENAME = "SCOTT");
explain extended select t1.EMPNO
,t1.ENAME
,t1.JOB
from emp t1
where t1.ENAME != "SCOTT" and t1.job in(
select job
from emp
where ENAME = "SCOTT");
Hive数据类型
整型:TINYINT、SMALLINT、INT、BIGINT
浮点:FLOAT、DOUBLE
布尔类型:BOOL (False/True)
字符串:STRING
时间类型:
create table testDate(
ts timestamp
,dt date
) row format delimited fields terminated by ',';
select from_unixtime(1630915221,'yyyy年MM月dd日 HH时mm分ss秒')
select unix_timestamp('2021年09月06日 16时00分21秒','yyyy年MM月dd日 HH时mm分ss秒');
select unix_timestamp('2021-01-14 14:24:57.200')
复杂数据类型:
create table foodArray(
food string,
dish_name array<string>
)row format delimited
fields terminated by '\t'
COLLECTION ITEMS terminated by ',';
鱼 红烧鱼,清蒸鱼,水煮鱼
鸡 白斩鸡、香酥鸡、黄焖鸡
LOAD DATA LOCAL INPATH '/usr/local/soft/data/dataArray.txt' INTO TABLE foodArray;
select food,dish_name[0] from foodArray;
-
map key:value,key2:v2,k3:v3
create table scoreMap(
name string,
score map<string,int>
)ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';
小明 语文:60,数学:0,英语:55
小红 语文:100,数学:130,英语:140
LOAD DATA LOCAL INPATH '/usr/local/soft/data/dataMap.txt' INTO TABLE scoreMap;
select name,score['语文'] from scoreMap;
create table scoreStruct(
name string,
score struct<course:string,score:int,course_id:int,tearcher:String>
)ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
小明 数学,1,000001,体育老师
小红 数学,100,000002,陈老师
LOAD DATA LOCAL INPATH '/usr/local/soft/data/dataStruct.txt' INTO TABLE scoreStruct;
select name,score.course,score.score from scoreStruct;
create table scoreGroup(
id int
,a MAP<STRING,ARRAY<STRING>>
)row format delimited fields terminated by '\t'
collection items terminated by ','
MAP KEYS TERMINATED BY ':';
1 english:80,90,70
2 math:89,78,86
3 chinese:99,100,82
LOAD DATA LOCAL INPATH '/usr/local/soft/data/datagroup.txt' INTO TABLE scoreGroup;
Hive HQL
DDL
创建数据库 create database xxxxx; 查看数据库 show databases; 删除数据库 drop database tmp; 强制删除数据库:drop database tmp cascade; 查看表:SHOW TABLES; 查看表的元信息: ?desc test_table; ?describe extended test_table; ?describe formatted test_table; 查看建表语句:show create table table_XXX 重命名表: ?alter table test_table rename to new_table; 修改列数据类型:alter table lv_test change column colxx string; 增加、删除分区: ?alter table test_table add partition (pt=xxxx) ?alter table test_table drop if exists partition(...)
DML
select id,name from tb t where ... and .... group by xxx having xxxx order by xxx asc/desc limit n;
Sort by会对每一个Reduce输出的结果进行非序,只能保证在相同的Reduce任务中有序
Distribute by只是控制Map端的输出结果如何分布到Reduce中(分区),并不会对进入reduce中的数据进行排序,如果我们想让同一年的数据—起处理,那么就可以使用distribute by来保证具有相同年份的数据分发到同一个reducer中进行处理,然后使用sort by来安装我们的期望对数据进行排序:
Cluster by = distribute by + sort by 当distribute by 与sort by字段相同时,可以直接使用cluster by
| | | |
上一章-hadoop篇-day 50 hive语法及进阶 下一章-随缘更新 | | | | |
听说长按大拇指👍会发生神奇的事情呢!好像是下面的画面,听说点过的人🧑一个月内就找到了对象的💑💑💑,并且还中了大奖💴$$$,考试直接拿满分💯,颜值突然就提升了😎,虽然对你好像也不需要,是吧,吴彦祖🤵! |
|