1、开窗函数的格式
FUNCTION_NAME([argument_list])
OVER (
[PARTITION BY window_partition,…]
[ORDER BY window_ordering, … [ASC|DESC]])
[ { ROWS | RANGE } BETWEEN frame_start AND frame_end ] );
FUNCTION_NAME:函数名称。如row_number()、sum()、first_value()等。
argument_list:函数的参数列表。
PARTITION BY:根据window_partition(分区字段)进行分区,该子句也被称为查询分区子句。类似于group by,都是将数据按照边界值进行分组。而OVER之前的函数在每一个分组之内进行,如果超出了分组,则函数会重新计算。
ORDER BY:将各个分区内的数据,根据window_ordering(排序字段)进行排序。ORDER BY子句会对输入的数据强制排序(窗口函数是SQL语句最后执行的函数,因此可以把SQL结果集想象成输入数据)。ORDER BY子句对于诸如row_number(),lead(),lag()等函数是必须的。如果数据无序,这些函数的结果就没有意义。
ROWS和RANGE分别表示选择前后几行、选择数据范围。
2、窗口范围图例
注释: PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点
UNBOUNDED PRECEDING 表示从前面的起点
UNBOUNDED FOLLOWING:表示到后面的终点 常用的范围: 01:ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 02:ROWS BETWEEN 3 PRECEDING AND CURRENT ROW 03:ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING,n PRECEDING m FOLLOWING:表示窗口的范围是[(当前行的行数)- n, (当前行的行数)+ m] row。 04:ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
3、连续登录问题(补充):
3.1:数据准备
要求:登录日志,计算每个人连续登录的最大天数!(注意:断一天还算连续)
id | dt |
---|
1001 | 2021-08-01 | 1001 | 2021-08-02 | 1001 | 2021-08-03 | 1001 | 2021-08-05 | 1001 | 2021-08-06 | 1001 | 2021-08-07 | 1001 | 2021-08-10 | 1001 | 2021-08-12 | 1002 | 2021-08-01 | 1002 | 2021-08-02 | 1002 | 2021-08-03 | 1002 | 2021-08-07 | 1002 | 2021-08-09 | 1002 | 2021-08-11 | 1002 | 2021-08-13 | 1002 | 2021-08-15 |
3.2:代码实现
with tx as (
select '1001'as id ,'2021-08-01' as dt
union all
select '1001'as id ,'2021-08-02' as dt
union all
select '1001'as id ,'2021-08-03' as dt
union all
select '1001'as id ,'2021-08-05' as dt
union all
select '1001'as id ,'2021-08-06' as dt
union all
select '1001'as id ,'2021-08-07' as dt
union all
select '1001'as id ,'2021-08-10' as dt
union all
select '1001'as id ,'2021-08-12' as dt
union all
select '1001'as id ,'2021-08-01' as dt
union all
select '1002'as id ,'2021-08-01' as dt
union all
select '1002'as id ,'2021-08-01' as dt
union all
select '1002'as id ,'2021-08-02' as dt
union all
select '1002'as id ,'2021-08-03' as dt
union all
select '1002'as id ,'2021-08-07' as dt
union all
select '1002'as id ,'2021-08-09' as dt
union all
select '1002'as id ,'2021-08-11' as dt
union all
select '1002'as id ,'2021-08-13' as dt
union all
select '1002'as id ,'2021-08-15' as dt
)
select
id,
max(days) days
from (
select
id,
flag,
datediff(max(dt),min(dt)) +1 as days
from (select
id,
dt,
sum(if(dtDiff>2,1,0)) over(partition by id order by dt) flag
from(select
id,
dt,
datediff(dt,lagDt) dtDiff
from
(select
id,
dt,
lag(dt,1,'1970-01-01')over(partition by id order by dt)lagDt
from
tx)t1)t2)t3)
group by id,flag;
)t4
group by id
id days
1001 7
1002 9
3.3:总结
断n天还算连续,只需要修改sum(if(dtDiff>2,1,0)) over(partition by id order by dt) flag 把2替换为n+1即可;
|