IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> hive窗口函数及示例 -> 正文阅读

[大数据]hive窗口函数及示例

窗口函数

? 其实只有over()才是真正的窗口函数,只是over常于其他函数组合使用,采用强大的效果。

一、over()的语法

分析函数 over(partition by 列名 order by 列名 rows between 开始位置 and  结束位置)

1.1、窗口控制,既window子句

over(partition by col order by 排序字段 rows between 1 preceding and 1 fllowing)

请添加图片描述

二、常用窗口函数

  1. sum(col) over() : 分组对col累计求和
  2. count(col) over() : 分组对col累计
  3. min(col) over() : 分组对col求最小值
  4. max(col) over() : 分组求col的最大值
  5. avg(col) over() : 分组求col列的平均值
  6. first_value(col) over() : 某分组排序后的第一个col值
  7. last_value(col) over() : 某分组排序后的最后一个col值
  8. lag(col,n,DEFAULT) : 统计往前n行的col值,n可选,默认为1,DEFAULT当往上第n行为NULL时候,取默认值,如不指定,则为NULL
  9. lead(col,n,DEFAULT) : 统计往后n行的col值,n可选,默认为1,DEFAULT当往下第n行为NULL时候,取默认值,如不指定,则为NULL
  10. ntile(n) : 用于将分组数据按照顺序切分成n片,返回当前切片值。注意:n必须为int类型
  11. row_number() over() : 排名函数,不会重复,适合于生成主键或者不并列排名
  12. rank() over() : 排名函数,有并列名次,名次不连续。如:1,1,3
  13. dense_rank() over() : 排名函数,有并列名次,名次连续。如:1,1,2

测试数据

create table shop_data(
shop_id int,
stat_date string,
ordamt double
)
row format delimited
fields terminated by '\t'
lines terminated by '\n'
stored as textfile;

-- 插入数据
insert into shop_data values 
(10026,'201901230030',5170),
(10026,'201901230100',5669),
(10026,'201901230130',2396),
(10026,'201901230200',1498),
(10026,'201901230230',1997),
(10026,'201901230300',1188),
(10026,'201901230330',598),
(10026,'201901230400',479),
(10026,'201901230430',1587),
(10026,'201901230530',799),
(10027,'201901230030',2170),
(10027,'201901230100',1623),
(10027,'201901230130',3397),
(10027,'201901230200',1434),
(10027,'201901230230',1001),
(10028,'201901230300',1687),
(10028,'201901230330',1298),
(10028,'201901230400',149),
(10029,'201901230430',2587),
(10029,'201901230530',589);

三、案例

count

select shop_id,stat_date,ordamt,
-- 以符合条件的所有行作为窗口
count(shop_id) over() as count1,
 -- 以按shop_id分组的所有行作为窗口
count(shop_id) over(partition by shop_id) as count2,
 -- 以按shop_id分组、按stat_date排序的所有行作为窗口
count(shop_id) over(partition by shop_id order by stat_date) as count3,
 -- 以按shop_id分组、按stat_date排序、按当前行+往前1行+往后2行的行作为窗口
count(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as count4,
 -- 以按shop_id分组、按stat_date排序、按从起点到末尾,默认从起点到末尾和count2结果相同
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as count5,
 -- 以按shop_id分组、按stat_date排序、按从起点到当前行的前一行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as count6,
 -- 以按shop_id分组、按stat_date排序、按从起点到当前行
count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as count7,
 -- 以按shop_id分组、按stat_date排序、按从当前行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as count8,
 -- 以按shop_id分组、按stat_date排序、按从当前行往后一行到末尾
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as count9,
 -- 以按shop_id分组、按stat_date排序、按从当前行往后一行到当前行往后2行
count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as count10
from shop_data;
--查询结果如下
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| shop_id  |   stat_date   | ordamt  | count1  | count2  | count3  | count4  | count5  | count6  | count7  | count8  | count9  | count10  |
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+
| 10026    | 201901230030  | 5170.0  | 20      | 10      | 1       | 3       | 10      | 0       | 1       | 10      | 9       | 2        |
| 10026    | 201901230100  | 5669.0  | 20      | 10      | 2       | 4       | 10      | 1       | 2       | 9       | 8       | 2        |
| 10026    | 201901230130  | 2396.0  | 20      | 10      | 3       | 4       | 10      | 2       | 3       | 8       | 7       | 2        |
| 10026    | 201901230200  | 1498.0  | 20      | 10      | 4       | 4       | 10      | 3       | 4       | 7       | 6       | 2        |
| 10026    | 201901230230  | 1997.0  | 20      | 10      | 5       | 4       | 10      | 4       | 5       | 6       | 5       | 2        |
| 10026    | 201901230300  | 1188.0  | 20      | 10      | 6       | 4       | 10      | 5       | 6       | 5       | 4       | 2        |
| 10026    | 201901230330  | 598.0   | 20      | 10      | 7       | 4       | 10      | 6       | 7       | 4       | 3       | 2        |
| 10026    | 201901230400  | 479.0   | 20      | 10      | 8       | 4       | 10      | 7       | 8       | 3       | 2       | 2        |
| 10026    | 201901230430  | 1587.0  | 20      | 10      | 9       | 3       | 10      | 8       | 9       | 2       | 1       | 1        |
| 10026    | 201901230530  | 799.0   | 20      | 10      | 10      | 2       | 10      | 9       | 10      | 1       | 0       | 0        |
| 10027    | 201901230030  | 2170.0  | 20      | 5       | 1       | 3       | 5       | 0       | 1       | 5       | 4       | 2        |
| 10027    | 201901230100  | 1623.0  | 20      | 5       | 2       | 4       | 5       | 1       | 2       | 4       | 3       | 2        |
| 10027    | 201901230130  | 3397.0  | 20      | 5       | 3       | 4       | 5       | 2       | 3       | 3       | 2       | 2        |
| 10027    | 201901230200  | 1434.0  | 20      | 5       | 4       | 3       | 5       | 3       | 4       | 2       | 1       | 1        |
| 10027    | 201901230230  | 1001.0  | 20      | 5       | 5       | 2       | 5       | 4       | 5       | 1       | 0       | 0        |
| 10028    | 201901230300  | 1687.0  | 20      | 3       | 1       | 3       | 3       | 0       | 1       | 3       | 2       | 2        |
| 10028    | 201901230330  | 1298.0  | 20      | 3       | 2       | 3       | 3       | 1       | 2       | 2       | 1       | 1        |
| 10028    | 201901230400  | 149.0   | 20      | 3       | 3       | 2       | 3       | 2       | 3       | 1       | 0       | 0        |
| 10029    | 201901230430  | 2587.0  | 20      | 2       | 1       | 2       | 2       | 0       | 1       | 2       | 1       | 1        |
| 10029    | 201901230530  | 589.0   | 20      | 2       | 2       | 2       | 2       | 1       | 2       | 1       | 0       | 0        |
+----------+---------------+---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----------+

sum

select
shop_id, stat_date, ordamt,
  -- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行
sum(ordamt) over(partition by shop_id order by stat_date) as sum_amt1,
  -- 以按shop_id分组、按stat_date排序、统计每个商品前半小时到后一小时的销售额(按当前行+往前1行+往后2行的行作为窗口)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as sum_amt2,
 -- 以按shop_id分组、按stat_date排序、统计每个商品的销售总额(从起点到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as sum_amt3,
 -- 以按shop_id分组、按stat_date排序、统计截止到前半小时的销售总额(从起点到当前行的前一行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as sum_amt4,
 -- 以按shop_id分组、按stat_date排序、统计每个商品截止到当前时间的销售总额,默认从起点到当前行(从起点到当前行)
sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as sum_amt5,
 -- 以按shop_id分组、按stat_date排序、统计当前时间及之后的销售总额(从当前行的末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as sum_amt6,
 -- 以按shop_id分组、按stat_date排序、统计当前时间的后半小时及之后的销售额(当前行后一行到末尾)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as sum_amt7,
 -- 以按shop_id分组、按stat_date排序、统计当前时间后半小时到后一小时之间的销售额(按从当前行往后一行到当前行往后2行)
sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as sum_amt8
from shop_data;

查询结果

+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| shop_id  |   stat_date   | ordamt  | sum_amt1  | sum_amt2  | sum_amt3  | sum_amt4  | sum_amt5  | sum_amt6  | sum_amt7  | sum_amt8  |
+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| 10026    | 201901230030  | 5170.0  | 5170.0    | 13235.0   | 21381.0   | NULL      | 5170.0    | 21381.0   | 16211.0   | 8065.0    |
| 10026    | 201901230100  | 5669.0  | 10839.0   | 14733.0   | 21381.0   | 5170.0    | 10839.0   | 16211.0   | 10542.0   | 3894.0    |
| 10026    | 201901230130  | 2396.0  | 13235.0   | 11560.0   | 21381.0   | 10839.0   | 13235.0   | 10542.0   | 8146.0    | 3495.0    |
| 10026    | 201901230200  | 1498.0  | 14733.0   | 7079.0    | 21381.0   | 13235.0   | 14733.0   | 8146.0    | 6648.0    | 3185.0    |
| 10026    | 201901230230  | 1997.0  | 16730.0   | 5281.0    | 21381.0   | 14733.0   | 16730.0   | 6648.0    | 4651.0    | 1786.0    |
| 10026    | 201901230300  | 1188.0  | 17918.0   | 4262.0    | 21381.0   | 16730.0   | 17918.0   | 4651.0    | 3463.0    | 1077.0    |
| 10026    | 201901230330  | 598.0   | 18516.0   | 3852.0    | 21381.0   | 17918.0   | 18516.0   | 3463.0    | 2865.0    | 2066.0    |
| 10026    | 201901230400  | 479.0   | 18995.0   | 3463.0    | 21381.0   | 18516.0   | 18995.0   | 2865.0    | 2386.0    | 2386.0    |
| 10026    | 201901230430  | 1587.0  | 20582.0   | 2865.0    | 21381.0   | 18995.0   | 20582.0   | 2386.0    | 799.0     | 799.0     |
| 10026    | 201901230530  | 799.0   | 21381.0   | 2386.0    | 21381.0   | 20582.0   | 21381.0   | 799.0     | NULL      | NULL      |
| 10027    | 201901230030  | 2170.0  | 2170.0    | 7190.0    | 9625.0    | NULL      | 2170.0    | 9625.0    | 7455.0    | 5020.0    |
| 10027    | 201901230100  | 1623.0  | 3793.0    | 8624.0    | 9625.0    | 2170.0    | 3793.0    | 7455.0    | 5832.0    | 4831.0    |
| 10027    | 201901230130  | 3397.0  | 7190.0    | 7455.0    | 9625.0    | 3793.0    | 7190.0    | 5832.0    | 2435.0    | 2435.0    |
| 10027    | 201901230200  | 1434.0  | 8624.0    | 5832.0    | 9625.0    | 7190.0    | 8624.0    | 2435.0    | 1001.0    | 1001.0    |
| 10027    | 201901230230  | 1001.0  | 9625.0    | 2435.0    | 9625.0    | 8624.0    | 9625.0    | 1001.0    | NULL      | NULL      |
| 10028    | 201901230300  | 1687.0  | 1687.0    | 3134.0    | 3134.0    | NULL      | 1687.0    | 3134.0    | 1447.0    | 1447.0    |
| 10028    | 201901230330  | 1298.0  | 2985.0    | 3134.0    | 3134.0    | 1687.0    | 2985.0    | 1447.0    | 149.0     | 149.0     |
| 10028    | 201901230400  | 149.0   | 3134.0    | 1447.0    | 3134.0    | 2985.0    | 3134.0    | 149.0     | NULL      | NULL      |
| 10029    | 201901230430  | 2587.0  | 2587.0    | 3176.0    | 3176.0    | NULL      | 2587.0    | 3176.0    | 589.0     | 589.0     |
| 10029    | 201901230530  | 589.0   | 3176.0    | 3176.0    | 3176.0    | 2587.0    | 3176.0    | 589.0     | NULL      | NULL      |
+----------+---------------+---------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2021-08-11 12:28:56  更:2021-08-11 12:29:59 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年5日历 -2024/5/17 16:43:28-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码