窗口函数总结
1、连续登录问题:
1、查看每位用户连续 2天 登录的情况
DATE_ADD(logintime,INTERVAL 1 DAY) 计算本次登录的后一天
lead(logintime,1,0) over (PARTITION BY userid ORDER BY logintime) 本次登录后的下一条登录记录的时间,以 userid 分组
连续 N天: DATE_ADD(logintime,INTERVAL N-1 DAY) lead(logintime,N-1,0) over (PARTITION BY userid ORDER BY logintime)
SELECT userId,logintime,
DATE_ADD(logintime,INTERVAL 1 DAY) AS nextday,
lead(logintime,1,0) over (PARTITION BY userid ORDER BY logintime) AS nextlogin
FROM tb_login
WITH t1 AS(
SELECT userId,logintime,
DATE_ADD(logintime,INTERVAL 1 DAY) AS nextday,
lead(logintime,1,0) over (PARTITION BY userid ORDER BY logintime) AS nextlogin
FROM tb_login)
SELECT DISTINCT userId FROM t1 WHERE nextlogin = nextday;
2、查看每位用户连续登录的情况
3、查看每位用户最大连续登录的天数
4、查看在某个时间段里连续登录天数超过N天的用户
行转列(多行转多列、多行转单列)
多行转多列 (case when 语句)
建表语句
CREATE TABLE score(t_name VARCHAR(10),t_subject VARCHAR(10),t_score INT);
INSERT INTO score VALUES('张三','语文',74);
INSERT INTO score VALUES('张三','数学',83);
INSERT INTO score VALUES('张三','物理',93);
INSERT INTO score VALUES('李四','语文',74);
INSERT INTO score VALUES('李四','数学',84);
INSERT INTO score VALUES('李四','物理',94);
INSERT INTO score VALUES('王五','数学',100);
INSERT INTO score VALUES('王五','语文',82);
INSERT INTO score VALUES('王五','物理',88);
SELECT * FROM score;
SELECT t_name,
MAX(CASE t_subject WHEN '语文' THEN t_score END) '语文',
MAX(CASE t_subject WHEN '数学' THEN t_score END) '数学',
MAX(CASE t_subject WHEN '物理' THEN t_score END) '物理'
FROM score
GROUP BY t_name;
多行转单列(group_contact,hive中用 collect_list(col name) 再用 concat_ws 函数)
group_contact(a,b,c,… separator ‘XX’),speparator ‘XX’ 中XX是多行数据拼接后的分隔符,如果不写默认的分隔符是’,'逗号,最后用group by 聚合,不写group by 会拼接所有的行。
SELECT t_name,GROUP_CONCAT(t_subject,':',t_score SEPARATOR ',') all_score FROM score GROUP BY t_name
hive中示例: 建表语句
CREATE TABLE row2col2(col1 VARCHAR(10),col2 VARCHAR(10),col3 INT);
INSERT INTO row2col2 VALUES('a','b',1);
INSERT INTO row2col2 VALUES('a','b',2);
INSERT INTO row2col2 VALUES('a','b',3);
INSERT INTO row2col2 VALUES('c','d',4);
INSERT INTO row2col2 VALUES('c','d',5);
INSERT INTO row2col2 VALUES('c','d',6);
SELECT * FROM row2col2;
CONCAT_WS(SplitChar,element1,element2……):用于实现 字符串 拼接,可以指定分隔符 特点:任意一个元素不为null,结果就不为null collect_list(colName):用于将一列中的多行合并为一行,不进行去重 collect_set(colName):用于将一列中的多行合并为一行,并进行去重
SELECT col1,
col2,
CONCAT_WS(',',collect_list(CAST(col3 AS STRING))) col3
FROM row2col2
GROUP BY col1,col2
列转行(多列转多行、多列转单行)
多列转多行
将上面的行转列生成的表作为列转行数据
CREATE TABLE score2 AS
SELECT t_name,
SUM(CASE t_subject WHEN '数学' THEN t_score ELSE 0 END) 数学,
SUM(CASE t_subject WHEN '语文' THEN t_score ELSE 0 END) 语文,
MAX(CASE t_subject WHEN '英语' THEN t_score ELSE 0 END) 英语
FROM score GROUP BY t_name;
SELECT * FROM score2;
SELECT t_name,
'数学' AS t_subject,
`数学` AS t_score
FROM score2;
通过 union all 联合
SELECT t_name,
'数学' AS t_subject,
`数学` AS t_score
FROM score2
UNION ALL
SELECT t_name,
'语文' AS t_subject,
`语文` AS t_score
FROM score2
UNION ALL
SELECT t_name,
'物理' AS t_subject,
`物理` AS t_score
FROM score2
ORDER BY t_name;
多列转单行
hive 中用 explode 函数和 lateral view 视图配合 explode(Map | Array) 用于将一个集合或者数组中的每个元素展开,将每个元素变成一行
select explode(split(“a,b,c,d”,“,”)); 若有数据表 col2row2
select col1,col2,lv.col3 as col3
from col2row2
lateral view explode(split(col3, ',')) lv as col3;
参考文章: https://blog.csdn.net/Grateful_Dead424/article/details/122832277 https://blog.csdn.net/qq_28603127/article/details/106217276
|