开窗函数简介
普通的聚合函数聚合的行集是组,开窗函数聚合的行集是窗口。
因此,普通的聚合函数每组(Group by)只返回一个值,而开窗函数则可为窗口中的每行都返回一个值。
简单理解,就是对查询的结果多出一列,这一列可以是聚合值,也可以是排序值。 开窗函数一般分为两类,聚合开窗函数和排序开窗函数。
准备数据
建立学生成绩表
0: jdbc:hive2:
OK
No rows affected (1.61 seconds)
0: jdbc:hive2:
. . . . . . . . . . . . . . > id int,
. . . . . . . . . . . . . . > stu_id int,
. . . . . . . . . . . . . . > chinese int,
. . . . . . . . . . . . . . > math int,
. . . . . . . . . . . . . . > english int,
. . . . . . . . . . . . . . > class_id string,
. . . . . . . . . . . . . . > dept_Id string
. . . . . . . . . . . . . . > )row format delimited fields terminated by '/t';
OK
No rows affected (4.162 seconds)
0: jdbc:hive2:
OK
+
| tab_name |
+
| bucket |
| bucket_com |
| employee |
| mix_partition |
| salary_dynamic_partition |
| static_mix |
| stu |
| stu_scores |
+
8 rows selected (1.314 seconds)
0: jdbc:hive2:
插入数据,数据样例如下stu_scores_input.txt
1 10001 69 70 71 C1 D1
2 10002 66 71 56 C1 D1
3 10003 60 72 57 C1 D1
4 10004 56 78 58 C1 D1
5 10005 98 79 59 C1 D1
6 10006 56 80 60 C2 D1
7 10007 100 81 61 C2 D1
8 10008 78 82 76 C2 D1
9 10009 90 80 77 C2 D1
10 10010 87 81 78 C2 D1
11 10011 66 82 79 C1 D1
12 10012 98 83 80 C1 D2
13 10013 97 84 81 C1 D2
14 10014 87 85 85 C1 D2
15 10015 98 94 86 C1 D2
16 10016 100 95 87 C2 D2
17 10017 68 96 88 C2 D2
18 10018 50 97 89 C1 D2
19 10019 42 98 90 C1 D2
20 10020 59 99 91 C1 D2
上传到hdfs
[hadoop@master sql_script]$ hdfs dfs -put stu_scores_input.txt
导入数据
load data inpath '/user/hadoop/stu_scores_input.txt' into table stu_scores;
聚合开窗函数
count开窗函数
执行下面脚本
select
stu_id
,math
,dept_id
,count(math) over() as ct1
,count(math) over(partition by class_id) as ct2
,count(math) over(partition by class_id order by math) as ct3
,count(math) over(partition by class_id order by math rows between 1 preceding and 2 following) as ct4
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
| stu_id | math | dept_id | ct1 | ct2 | ct3 | ct4 |
+
| 10001 | 70 | D1 | 11 | 6 | 1 | 3 |
| 10002 | 71 | D1 | 11 | 6 | 2 | 4 |
| 10003 | 72 | D1 | 11 | 6 | 3 | 4 |
| 10004 | 78 | D1 | 11 | 6 | 4 | 4 |
| 10005 | 79 | D1 | 11 | 6 | 5 | 3 |
| 10011 | 82 | D1 | 11 | 6 | 6 | 2 |
| 10009 | 80 | D1 | 11 | 5 | 2 | 3 |
| 10006 | 80 | D1 | 11 | 5 | 2 | 4 |
| 10010 | 81 | D1 | 11 | 5 | 4 | 4 |
| 10007 | 81 | D1 | 11 | 5 | 4 | 3 |
| 10008 | 82 | D1 | 11 | 5 | 5 | 2 |
+
解释:
在dept_id = 'D1'的条件下,对于stu_id = '10010'
ct1:统计所有行数位11
ct2:统计class_id = 'C2' 进行分组的个数5,
ct3:统计class_id = 'C2',math <= 81的个数4,
ct4:10010所在行,向前(上)+1行,向后(下)+2行,统计math个数4
sum开窗函数
示例代码
select
stu_id
,math
,class_id
,sum(math) over() as ct1
,sum(math) over(partition by class_id) as ct2
,sum(math) over(partition by class_id order by math) as ct3
,sum(math) over(partition by class_id order by math rows between 1 preceding and 2 following) as ct4
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
| stu_id | math | class_id | ct1 | ct2 | ct3 | ct4 |
+
| 10001 | 70 | C1 | 856 | 452 | 70 | 213 |
| 10002 | 71 | C1 | 856 | 452 | 141 | 291 |
| 10003 | 72 | C1 | 856 | 452 | 213 | 300 |
| 10004 | 78 | C1 | 856 | 452 | 291 | 311 |
| 10005 | 79 | C1 | 856 | 452 | 370 | 239 |
| 10011 | 82 | C1 | 856 | 452 | 452 | 161 |
| 10009 | 80 | C2 | 856 | 404 | 160 | 241 |
| 10006 | 80 | C2 | 856 | 404 | 160 | 322 |
| 10010 | 81 | C2 | 856 | 404 | 322 | 324 |
| 10007 | 81 | C2 | 856 | 404 | 322 | 244 |
| 10008 | 82 | C2 | 856 | 404 | 404 | 163 |
+
min开窗函数
示例代码:
0: jdbc:hive2:
. . . . . . . . . . . . . . > stu_id
. . . . . . . . . . . . . . > ,math
. . . . . . . . . . . . . . > ,class_id
. . . . . . . . . . . . . . >
. . . . . . . . . . . . . . > ,MIN(math) OVER() AS ct1
. . . . . . . . . . . . . . >
. . . . . . . . . . . . . . > ,MIN(math) OVER(PARTITION BY class_id) AS ct2
. . . . . . . . . . . . . . >
. . . . . . . . . . . . . . > ,MIN(math) OVER(PARTITION BY class_id ORDER BY math) AS ct3
. . . . . . . . . . . . . . >
. . . . . . . . . . . . . . > ,MIN(math) OVER(PARTITION BY class_id ORDER BY math ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING) AS ct4
. . . . . . . . . . . . . . > FROM myhive.stu_scores
. . . . . . . . . . . . . . > WHERE dept_id = 'D1';
结果
+
| stu_id | math | class_id | ct1 | ct2 | ct3 | ct4 |
+
| 10001 | 70 | C1 | 70 | 70 | 70 | 70 |
| 10002 | 71 | C1 | 70 | 70 | 70 | 70 |
| 10003 | 72 | C1 | 70 | 70 | 70 | 71 |
| 10004 | 78 | C1 | 70 | 70 | 70 | 72 |
| 10005 | 79 | C1 | 70 | 70 | 70 | 78 |
| 10011 | 82 | C1 | 70 | 70 | 70 | 79 |
| 10006 | 80 | C2 | 70 | 80 | 80 | 80 |
| 10009 | 80 | C2 | 70 | 80 | 80 | 80 |
| 10007 | 81 | C2 | 70 | 80 | 80 | 80 |
| 10010 | 81 | C2 | 70 | 80 | 80 | 81 |
| 10008 | 82 | C2 | 70 | 80 | 80 | 81 |
+
max开窗函数
示例代码:
select
stu_id
,math
,class_id
,max(math) over() as ct1
,max(math) over(partition by class_id) as ct2
,max(math) over(partition by class_id order by math) as ct3
,max(math) over(partition by class_id order by math rows between 1 preceding and 2 following) as ct4
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
| stu_id | math | class_id | ct1 | ct2 | ct3 | ct4 |
+
| 10001 | 70 | C1 | 82 | 82 | 70 | 72 |
| 10002 | 71 | C1 | 82 | 82 | 71 | 78 |
| 10003 | 72 | C1 | 82 | 82 | 72 | 79 |
| 10004 | 78 | C1 | 82 | 82 | 78 | 82 |
| 10005 | 79 | C1 | 82 | 82 | 79 | 82 |
| 10011 | 82 | C1 | 82 | 82 | 82 | 82 |
| 10006 | 80 | C2 | 82 | 82 | 80 | 81 |
| 10009 | 80 | C2 | 82 | 82 | 80 | 81 |
| 10007 | 81 | C2 | 82 | 82 | 81 | 82 |
| 10010 | 81 | C2 | 82 | 82 | 81 | 82 |
| 10008 | 82 | C2 | 82 | 82 | 82 | 82 |
+
avg开窗函数
示例代码:
select
stu_id
,math
,class_id
,avg(math) over() as ct1
,avg(math) over(partition by class_id) as ct2
,avg(math) over(partition by class_id order by math) as ct3
,avg(math) over(partition by class_id order by math rows between 1 preceding and 2 following) as ct4
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
| stu_id | math | class_id | ct1 | ct2 | ct3 | ct4 |
+
| 10001 | 70 | C1 | 77.8182 | 75.3333 | 70.0000 | 71.0000 |
| 10002 | 71 | C1 | 77.8182 | 75.3333 | 70.5000 | 72.7500 |
| 10003 | 72 | C1 | 77.8182 | 75.3333 | 71.0000 | 75.0000 |
| 10004 | 78 | C1 | 77.8182 | 75.3333 | 72.7500 | 77.7500 |
| 10005 | 79 | C1 | 77.8182 | 75.3333 | 74.0000 | 79.6667 |
| 10011 | 82 | C1 | 77.8182 | 75.3333 | 75.3333 | 80.5000 |
| 10006 | 80 | C2 | 77.8182 | 80.8000 | 80.0000 | 80.3333 |
| 10009 | 80 | C2 | 77.8182 | 80.8000 | 80.0000 | 80.5000 |
| 10007 | 81 | C2 | 77.8182 | 80.8000 | 80.5000 | 81.0000 |
| 10010 | 81 | C2 | 77.8182 | 80.8000 | 80.5000 | 81.3333 |
| 10008 | 82 | C2 | 77.8182 | 80.8000 | 80.8000 | 81.5000 |
+
位置开窗函数
位置开窗函数,也是聚合函数的一种l类别。
first_value开窗函数
函数功能:
返回分区中的第一个值。
示例代码:
select
stu_id
,math
,class_id
,first_value(math) over() as ct1
,first_value(math) over(partition by class_id) as ct2
,first_value(math) over(partition by class_id order by math) as ct3
,first_value(math) over(partition by class_id order by math rows between 1 preceding and 2 following) as ct4
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |ct1 |ct2 |ct3 |ct4 |
+
| 10001 | 70 |C1 | 70 | 70 | 70 | 70 |
| 10002 | 71 |C1 | 70 | 70 | 70 | 70 |
| 10003 | 72 |C1 | 70 | 70 | 70 | 71 |
| 10004 | 78 |C1 | 70 | 70 | 70 | 72 |
| 10005 | 79 |C1 | 70 | 70 | 70 | 78 |
| 10011 | 82 |C1 | 70 | 70 | 70 | 79 |
| 10006 | 80 |C2 | 70 | 80 | 80 | 80 |
| 10009 | 80 |C2 | 70 | 80 | 80 | 80 |
| 10007 | 81 |C2 | 70 | 80 | 80 | 80 |
| 10010 | 81 |C2 | 70 | 80 | 80 | 81 |
| 10008 | 82 |C2 | 70 | 80 | 80 | 81 |
+
last_value开窗函数
函数功能:
取分组内排序后,截止到当前行,最后一个值。
示例代码:
select
stu_id
,math
,class_id
,last_value(math) over() as ct1
,last_value(math) over(partition by class_id) as ct2
,last_value(math) over(partition by class_id order by math) as ct3
,last_value(math) over(partition by class_id order by math rows between 1 preceding and 2 following) as ct4
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |ct1 |ct2 |ct3 |ct4 |
+
| 10001 | 70 |C1 | 82 | 82 | 70 | 72 |
| 10002 | 71 |C1 | 82 | 82 | 71 | 78 |
| 10003 | 72 |C1 | 82 | 82 | 72 | 79 |
| 10004 | 78 |C1 | 82 | 82 | 78 | 82 |
| 10005 | 79 |C1 | 82 | 82 | 79 | 82 |
| 10011 | 82 |C1 | 82 | 82 | 82 | 82 |
| 10006 | 80 |C2 | 82 | 81 | 80 | 81 |
| 10009 | 80 |C2 | 82 | 81 | 80 | 81 |
| 10007 | 81 |C2 | 82 | 81 | 81 | 82 |
| 10010 | 81 |C2 | 82 | 81 | 81 | 82 |
| 10008 | 82 |C2 | 82 | 81 | 82 | 82 |
+
分析:
看上面结果可以发现,ct1值和预期一样。
ct2的返回值,在C1分组是正确的,但在C2分组中返回值为81而不是82。
ct3的返回值,则等于当前行的值。
ct4的值为当前行向下偏移2行的值。
可以说,输出结果其实不符合我们预期的结果,即返回分组中最后一行的值,为什么会这样呢?
1、首先,分析函数不是聚合函数。聚合函数将多个输入行折叠为一个输出行,而分析函数为每个输入行仅计算一个输出行。因此,您需要确保正在考虑每个输入行的输出是什么。
2、其次,分析函数在行的“窗口”上操作,该窗口是该行所属的“分区”的子集。输入行的分区由PARTITION BY子句确定,或者如果您希望分区是整个输入行集,则可以将其忽略。该窗口由ROWS子句提供,但是如果您未指定窗口(用户通常不指定),则默认为整个分区(不应用任何顺序时)或该分区中行的集合。第一行到当前行(当存在ORDER BY时)。请注意,分区中每个输入行的窗口可能会有所不同!
3、回到LAST_VALUE。尽管上述默认窗口在许多情况下都是合理的(例如,计算累加和),但使用LAST_VALUE时,它的效果极差。 LAST_VALUE函数返回窗口中最后一行的值,默认情况下,窗口中的最后一行是当前行。
因此,要解决该问题,需要明确指定LAST_VALUE的窗口是整个分区,而不仅仅是整个当前行。
SELECT x, LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM table
select
stu_id
,math
,class_id
,last_value(math) over() as ct1
,last_value(math) over(partition by class_id order by math rows between unbounded preceding and unbounded following) as ct2
,last_value(math) over(partition by class_id order by math rows between 1 preceding and 2 following) as ct4
from myhive.stu_scores
where dept_id = 'D1';
+
|stu_id |math |class_id |ct1 |ct2 |ct4 |
+
| 10001 | 70 |C1 | 82 | 82 | 72 |
| 10002 | 71 |C1 | 82 | 82 | 78 |
| 10003 | 72 |C1 | 82 | 82 | 79 |
| 10004 | 78 |C1 | 82 | 82 | 82 |
| 10005 | 79 |C1 | 82 | 82 | 82 |
| 10011 | 82 |C1 | 82 | 82 | 82 |
| 10006 | 80 |C2 | 82 | 82 | 81 |
| 10009 | 80 |C2 | 82 | 82 | 81 |
| 10007 | 81 |C2 | 82 | 82 | 82 |
| 10010 | 81 |C2 | 82 | 82 | 82 |
| 10008 | 82 |C2 | 82 | 82 | 82 |
+
需要特别注意order by和rows的运用 ,不同字段,返回结果是不同的
select
x,
first_value(x) over (order by x asc) first_asc,
first_value(x) over (order by x desc) first_desc,
last_value(x) over (order by x asc) last_asc,
last_value(x) over (order by x desc) last_desc
from
(select 4 as x
union all
select 2 as x
union all
select 1 as x
union all
select 3 as x)t
+-+
|x|first_asc|first_desc|last_asc|last_desc|
+-+
|4| 1| 4| 4| 4|
|3| 1| 4| 3| 3|
|2| 1| 4| 2| 2|
|1| 1| 4| 1| 1|
+-+
select
x,
first_value(x) over (order by x asc) first_asc,
first_value(x) over (order by x desc) first_desc,
last_value(x) over (order by x asc rows between unbounded preceding and unbounded following) last_asc,
last_value(x) over (order by x desc rows between unbounded preceding and unbounded following) last_desc
from
(select 4 as x
union all
select 2 as x
union all
select 1 as x
union all
select 3 as x)t
+-+
|x|first_asc|first_desc|last_asc|last_desc|
+-+
|4| 1| 4| 4| 1|
|3| 1| 4| 4| 1|
|2| 1| 4| 4| 1|
|1| 1| 4| 4| 1|
+-+
注意上面两个结果中last_asc和last_desc的值
lag开窗函数
函数功能:
lag(col,n,default) 用于统计窗口内往上第n个值,即取每个分区内某列的前面的第n个值。
col:列名
n:往上第n行
default:往上第n行为NULL时候,取默认值,不指定则取NULL
示例代码:
select
stu_id
,math
,class_id
,lag(math,2,0) over(partition by class_id order by math rows between unbounded preceding and unbounded following) as lag_value
from myhive.stu_scores
where dept_id = 'D1'
结果:
+
|stu_id |math |class_id |lag_value |
| 10001 | 70 |C1 | 0 |
| 10002 | 71 |C1 | 0 |
| 10003 | 72 |C1 | 70 |
| 10004 | 78 |C1 | 71 |
| 10005 | 79 |C1 | 72 |
| 10011 | 82 |C1 | 78 |
| 10006 | 80 |C2 | 0 |
| 10009 | 80 |C2 | 0 |
| 10007 | 81 |C2 | 80 |
| 10010 | 81 |C2 | 80 |
| 10008 | 82 |C2 | 81 |
lead开窗函数
函数功能:
lead(col,n,default) 用于统计窗口内往下第n个值,即每个分区内某列的后n个值。
col:列名
n:往下第n行
default:往下第n行为NULL时候,取默认值,不指定则取NULL
示例代码:
select
stu_id
,math
,class_id
,lead(math,2,0) over(partition by class_id order by math rows between unbounded preceding and unbounded following) as lag_value
from myhive.stu_scores
where dept_id = 'D1'
结果:
+
|stu_id |math |class_id |lag_value |
+
| 10001 | 70 |C1 | 72 |
| 10002 | 71 |C1 | 78 |
| 10003 | 72 |C1 | 79 |
| 10004 | 78 |C1 | 82 |
| 10005 | 79 |C1 | 0 |
| 10011 | 82 |C1 | 0 |
| 10006 | 80 |C2 | 81 |
| 10009 | 80 |C2 | 81 |
| 10007 | 81 |C2 | 82 |
| 10010 | 81 |C2 | 0 |
| 10008 | 82 |C2 | 0 |
+
cume_dist开窗函数
函数功能:
cume_dist 返回小于等于当前值的行数/分组内总行数。
计算某个窗口或分区中某个值的累积分布。假定升序排序,则使用以下公式确定累积分布:
小于等于当前值x的行数 / 窗口或partition分区内的总行数。其中,x 等于 order by 子句中指定的列的当前行中的值。
示例代码:
select
stu_id
,math
,class_id
,round(cume_dist() over(order by math ),2) as cd1
,round(cume_dist() over(partition by class_id order by math),2) as cd2
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |cd1 |cd2 |
+
| 10001 | 70 |C1 |0.09 |0.17 |
| 10002 | 71 |C1 |0.18 |0.33 |
| 10003 | 72 |C1 |0.27 | 0.5 |
| 10004 | 78 |C1 |0.36 |0.67 |
| 10005 | 79 |C1 |0.45 |0.83 |
| 10011 | 82 |C1 | 1.0 | 1.0 |
| 10006 | 80 |C2 |0.64 | 0.4 |
| 10009 | 80 |C2 |0.64 | 0.4 |
| 10007 | 81 |C2 |0.82 | 0.8 |
| 10010 | 81 |C2 |0.82 | 0.8 |
| 10008 | 82 |C2 | 1.0 | 1.0 |
+
分析:
1、cd1中由于没有指定partition,所以是针对整个结果集进行统计,比如id为10001的math=70,则小于等于70的只有1行数据,整个结果集为11行,即1/11 = 0.09,其余返回结果同理可得。
2、cd2中指定了partition,所以计算返回结果区分class_id = 'C1' 和class_id = 'C2'两组进行分别计算。
排序开窗函数
row_number开窗函数
函数功能:
从1开始对分区内的数据排序。
示例代码:
select
stu_id
,math
,class_id
,row_number() over(partition by class_id order by math) as rn
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |rn |
+
| 10001 | 70 |C1 | 1 |
| 10002 | 71 |C1 | 2 |
| 10003 | 72 |C1 | 3 |
| 10004 | 78 |C1 | 4 |
| 10005 | 79 |C1 | 5 |
| 10011 | 82 |C1 | 6 |
| 10006 | 80 |C2 | 1 |
| 10009 | 80 |C2 | 2 |
| 10007 | 81 |C2 | 3 |
| 10010 | 81 |C2 | 4 |
| 10008 | 82 |C2 | 5 |
+
rank开窗函数
语法:
rank 开窗函数基于 over 子句中的 order by 确定一组值中一个值的排名。如果存在partition by ,则为每个分区组中的每个值排名。排名可能不是连续的。例如,如果两个行的排名为 1,则下一个排名为 3。
示例代码:
select
stu_id
,math
,class_id
,row_number() over(partition by class_id order by math) as rn
,rank() over(partition by class_id order by math) as rk
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |rn |rk |
+
| 10001 | 70 |C1 | 1 | 1 |
| 10002 | 71 |C1 | 2 | 2 |
| 10003 | 72 |C1 | 3 | 3 |
| 10004 | 78 |C1 | 4 | 4 |
| 10005 | 79 |C1 | 5 | 5 |
| 10011 | 82 |C1 | 6 | 6 |
| 10006 | 80 |C2 | 1 | 1 |
| 10009 | 80 |C2 | 2 | 1 |
| 10007 | 81 |C2 | 3 | 3 |
| 10010 | 81 |C2 | 4 | 3 |
| 10008 | 82 |C2 | 5 | 5 |
+
dense_rank开窗函数
函数功能:
dense_rank与rank有一点不同,当排名一样的时候,接下来的行是连续的。如两个行的排名为 1,则下一个排名为 2。
示例代码:
select
stu_id
,math
,class_id
,row_number() over(partition by class_id order by math) as rn
,rank() over(partition by class_id order by math) as rk
,dense_rank() over(partition by class_id order by math) as drk
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |rn |rk |drk |
+
| 10001 | 70 |C1 | 1 | 1 | 1 |
| 10002 | 71 |C1 | 2 | 2 | 2 |
| 10003 | 72 |C1 | 3 | 3 | 3 |
| 10004 | 78 |C1 | 4 | 4 | 4 |
| 10005 | 79 |C1 | 5 | 5 | 5 |
| 10011 | 82 |C1 | 6 | 6 | 6 |
| 10006 | 80 |C2 | 1 | 1 | 1 |
| 10009 | 80 |C2 | 2 | 1 | 1 |
| 10007 | 81 |C2 | 3 | 3 | 2 |
| 10010 | 81 |C2 | 4 | 3 | 2 |
| 10008 | 82 |C2 | 5 | 5 | 3 |
+
percent_rank开窗函数
函数功能:
计算给定行的百分比排名。可以用来计算超过了百分之多少的人。
即:(当前行的rank值-1)/(分组内的总行数-1)
示例代码:
select
stu_id
,math
,class_id
,row_number() over(partition by class_id order by math) as rn
,rank() over(partition by class_id order by math) as rk
,dense_rank() over(partition by class_id order by math) as drk
,percent_rank() over(partition by class_id order by math) as prk
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |rn |rk |drk |prk |
+
| 10001 | 70 |C1 | 1 | 1 | 1 |0.0 |
| 10002 | 71 |C1 | 2 | 2 | 2 |0.2 |
| 10003 | 72 |C1 | 3 | 3 | 3 |0.4 |
| 10004 | 78 |C1 | 4 | 4 | 4 |0.6 |
| 10005 | 79 |C1 | 5 | 5 | 5 |0.8 |
| 10011 | 82 |C1 | 6 | 6 | 6 |1.0 |
| 10006 | 80 |C2 | 1 | 1 | 1 |0.0 |
| 10009 | 80 |C2 | 2 | 1 | 1 |0.0 |
| 10007 | 81 |C2 | 3 | 3 | 2 |0.5 |
| 10010 | 81 |C2 | 4 | 3 | 2 |0.5 |
| 10008 | 82 |C2 | 5 | 5 | 3 |1.0 |
+
ntile开窗函数
函数功能:
NTILE(n),将每个分区内排序后的结果均分成N份。本质是将每个分区拆分成更小的分区。
如果切片不均匀,默认增加第一个切片的分布。
NTILE不支持ROWS BETWEEN。
示例代码:
select
stu_id
,math
,class_id
,ntile(2) over(partition by class_id order by math) as nt1
,ntile(3) over(partition by class_id order by math) as nt2
from myhive.stu_scores
where dept_id = 'D1';
结果:
+
|stu_id |math |class_id |nt1 |nt2 |
+
| 10001 | 70 |C1 | 1 | 1 |
| 10002 | 71 |C1 | 1 | 1 |
| 10003 | 72 |C1 | 1 | 2 |
| 10004 | 78 |C1 | 2 | 2 |
| 10005 | 79 |C1 | 2 | 3 |
| 10011 | 82 |C1 | 2 | 3 |
| 10006 | 80 |C2 | 1 | 1 |
| 10009 | 80 |C2 | 1 | 1 |
| 10007 | 81 |C2 | 1 | 2 |
| 10010 | 81 |C2 | 2 | 2 |
| 10008 | 82 |C2 | 2 | 3 |
+
|