OVER函数
1.含义:
窗口函数主要解决 分析整体数据时,多个数据和一个结果对应,为每一条数据开启指定行范围的窗口; over() 指定 分析函数 工作时的数据窗口大小,这个窗口大小可能会随行的改变而改变;
用于between … and … 表示范围概念: CURRENT ROW:当前行 n PRECEDING:往前n行数据 n FOLLOWING:往后n行数据 UNBOUNDED:起点, UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING表示到后面的终点
LAG(col,n,default_val):往前第n行数据 LEAD(col,n, default_val):往后第n行数据
NTILE(n):把有序窗口的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。
常用格式: over() 会为每条数据开启一个窗口,默认的窗口大小就是当前数据集的大小
over(partition by 字段) 会按照指定的字段进行分区,把分区字段值相同的数据划分到同一个分区;分区中每条数据开启一个窗口,每条数据的默认的窗口大小就是当前分区数据集的大小
over(order by 字段) 会在窗口中按照指定的字段对数据进行排序;会为每条数据开启一个窗口,默认的窗口大小为从数据集开始UNBOUNDED PRECEDING到当前行CURRENT ROW
over(partition by 字段 order by 字段) 会按照指定的字段进行分区,把分区字段值相同的数据划分到同一个分区,会在每个分区中按照指定的字段进行排序;分区中每条数据开启一个窗口,每条数据的默认的窗口大小就是当前分区从数据集开始UNBOUNDED PRECEDING到当前行CURRENT ROW
over(partition by 字段 order by 字段 rows between … and …) 会按照指定的字段进行分区,把分区字段值相同的数据划分到同一个分区,会在每个分区中按照指定的字段进行排序;分区中每条数据开启一个窗口,每条数据的窗口大小为指定的窗口大小
2.示例:
business 表数据:
(1)查询在2017年4月份购买过的顾客及总人数
select name,count(*) over ()
from business
where substring(orderdate,1,7) = '2017-04'
group by name;
(2)查询顾客的购买明细及每个月所有顾客月购买总额
select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) from
business;
查询顾客的购买明细及每个顾客月购买总额
select name,orderdate,cost,sum(cost) over(partition by name, substring(orderdate,0,7)) from business;
(3)将每个顾客的cost按照日期进行累加
select name,orderdate,cost,
sum(cost) over() as sample1,
sum(cost) over(partition by name) as sample2,
sum(cost) over(partition by name order by orderdate) as sample3,
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5,
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,
sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7
from business;
(4)查看顾客上次的购买时间以及下一次的购买时间
select name,orderdate,cost,
lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) as time1, lag(orderdate,2) over (partition by name order by orderdate) as time2
from business;
(5)查询前20%时间的订单信息
select * from (
select name,orderdate,cost, ntile(5) over(order by orderdate) sorted
from business
) t
where sorted = 1;
|