IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> over 与lateral view 的hive、spark sql执行计划 -> 正文阅读

[大数据]over 与lateral view 的hive、spark sql执行计划

建表语句

create table test_over 
(
    user_id string,
    login_date string
)  COMMENT '测试函数使用,可以删除'
row format delimited
fields terminated by '\t';

over 执行计划

spark

spark-sql> explain select
         >   user_id
         >   ,login_date
         >   ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
         > from test_over;
22/03/10 10:55:50 INFO [main] CodeGenerator: Code generated in 9.641436 ms
== Physical Plan ==
Window [lag(login_date#34, 1, 0001-01-01) windowspecdefinition(user_id#33, login_date#34 ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS prev_date#30], [user_id#33], [login_date#34 ASC NULLS FIRST]
+- *(1) Sort [user_id#33 ASC NULLS FIRST, login_date#34 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#33, 200)
      +- Scan hive default.test_over [user_id#33, login_date#34], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#33, login_date#34]
Time taken: 0.098 seconds, Fetched 1 row(s)
22/03/10 10:55:50 INFO [main] SparkSQLCLIDriver: Time taken: 0.098 seconds, Fetched 1 row(s)
spark-sql> 
         > explain 
         > select
         >   user_id
         >   ,login_date
         >   ,first_value(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
Window [first(login_date#39, false) windowspecdefinition(user_id#38, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#35], [user_id#38]
+- *(1) Sort [user_id#38 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#38, 200)
      +- Scan hive default.test_over [user_id#38, login_date#39], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#38, login_date#39]
Time taken: 0.077 seconds, Fetched 1 row(s)
22/03/10 10:57:34 INFO [main] SparkSQLCLIDriver: Time taken: 0.077 seconds, Fetched 1 row(s)
spark-sql> 
         > 
         > explain select
         >   user_id
         >   ,login_date
         >   ,max(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
Window [max(login_date#45) windowspecdefinition(user_id#44, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#41], [user_id#44]
+- *(1) Sort [user_id#44 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#44, 200)
      +- Scan hive default.test_over [user_id#44, login_date#45], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#44, login_date#45]
Time taken: 0.081 seconds, Fetched 1 row(s)
22/03/10 10:58:15 INFO [main] SparkSQLCLIDriver: Time taken: 0.081 seconds, Fetched 1 row(s)

hive

hive> explain select
    >   user_id
    >   ,login_date
    >   ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
    > from test_over;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_over
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Reduce Output Operator
              key expressions: user_id (type: string), login_date (type: string)
              sort order: ++
              Map-reduce partition columns: user_id (type: string)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Execution mode: vectorized
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col1 ASC NULLS FIRST
                  partition by: _col0
                  raw input shape:
                  window functions:
                      window function definition
                        alias: lag_window_0
                        arguments: _col1, 1, '0001-01-01'
                        name: lag
                        window function: GenericUDAFLagEvaluator
                        window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
                        isPivotResult: true
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), lag_window_0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
Time taken: 0.211 seconds, Fetched: 61 row(s)
hive> explain select
    >   user_id
    >   ,login_date
    >   ,max(login_date) over(partition by user_id ) prev_date
    > from test_over;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_over
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Reduce Output Operator
              key expressions: user_id (type: string)
              sort order: +
              Map-reduce partition columns: user_id (type: string)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              value expressions: login_date (type: string)
      Execution mode: vectorized
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col0 ASC NULLS FIRST
                  partition by: _col0
                  raw input shape:
                  window functions:
                      window function definition
                        alias: max_window_0
                        arguments: _col1
                        name: max
                        window function: GenericUDAFMaxEvaluator
                        window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), max_window_0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 3.278 seconds, Fetched: 61 row(s)

lateral view 执行计划

spark

spark-sql> 
         > explain select
         >   user_id
         >   ,login_date
         >   ,single_num
         > from test_over
         > lateral view explode(split(login_date,'-')) tmp as single_num;
== Physical Plan ==
Generate explode(split(login_date#58, -)), [user_id#57, login_date#58], false, [single_num#59]
+- Scan hive default.test_over [user_id#57, login_date#58], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#57, login_date#58]
Time taken: 0.103 seconds, Fetched 1 row(s)
22/03/10 14:39:38 INFO [main] SparkSQLCLIDriver: Time taken: 0.103 seconds, Fetched 1 row(s)

hive

hive> 
    > explain select
    >   user_id
    >   ,login_date
    >   ,single_num
    > from test_over
    > lateral view explode(split(login_date,'-')) tmp as single_num;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: test_over
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          Lateral View Forward
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: user_id (type: string), login_date (type: string)
              outputColumnNames: user_id, login_date
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Lateral View Join Operator
                outputColumnNames: _col0, _col1, _col5
                Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  ListSink
            Select Operator
              expressions: split(login_date, '-') (type: array<string>)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              UDTF Operator
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                function name: explode
                Lateral View Join Operator
                  outputColumnNames: _col0, _col1, _col5
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    ListSink

Time taken: 0.081 seconds, Fetched: 41 row(s)
  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2022-03-11 22:17:06  更:2022-03-11 22:18:24 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/24 9:20:47-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码