建表语句

create table test_over 
(
    user_id string,
    login_date string
)  COMMENT '测试函数使用，可以删除'
row format delimited
fields terminated by '\t';

over 执行计划

spark

spark-sql> explain select
         >   user_id
         >   ,login_date
         >   ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
         > from test_over;
22/03/10 10:55:50 INFO [main] CodeGenerator: Code generated in 9.641436 ms
== Physical Plan ==
Window [lag(login_date#34, 1, 0001-01-01) windowspecdefinition(user_id#33, login_date#34 ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS prev_date#30], [user_id#33], [login_date#34 ASC NULLS FIRST]
+- *(1) Sort [user_id#33 ASC NULLS FIRST, login_date#34 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#33, 200)
      +- Scan hive default.test_over [user_id#33, login_date#34], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#33, login_date#34]
Time taken: 0.098 seconds, Fetched 1 row(s)
22/03/10 10:55:50 INFO [main] SparkSQLCLIDriver: Time taken: 0.098 seconds, Fetched 1 row(s)

spark-sql> 
         > explain 
         > select
         >   user_id
         >   ,login_date
         >   ,first_value(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
Window [first(login_date#39, false) windowspecdefinition(user_id#38, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#35], [user_id#38]
+- *(1) Sort [user_id#38 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#38, 200)
      +- Scan hive default.test_over [user_id#38, login_date#39], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#38, login_date#39]
Time taken: 0.077 seconds, Fetched 1 row(s)
22/03/10 10:57:34 INFO [main] SparkSQLCLIDriver: Time taken: 0.077 seconds, Fetched 1 row(s)

spark-sql> 
         > 
         > explain select
         >   user_id
         >   ,login_date
         >   ,max(login_date) over(partition by user_id ) prev_date
         > from test_over;
== Physical Plan ==
Window [max(login_date#45) windowspecdefinition(user_id#44, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#41], [user_id#44]
+- *(1) Sort [user_id#44 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(user_id#44, 200)
      +- Scan hive default.test_over [user_id#44, login_date#45], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#44, login_date#45]
Time taken: 0.081 seconds, Fetched 1 row(s)
22/03/10 10:58:15 INFO [main] SparkSQLCLIDriver: Time taken: 0.081 seconds, Fetched 1 row(s)

hive

hive> explain select
    >   user_id
    >   ,login_date
    >   ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
    > from test_over;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_over
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Reduce Output Operator
              key expressions: user_id (type: string), login_date (type: string)
              sort order: ++
              Map-reduce partition columns: user_id (type: string)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Execution mode: vectorized
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col1 ASC NULLS FIRST
                  partition by: _col0
                  raw input shape:
                  window functions:
                      window function definition
                        alias: lag_window_0
                        arguments: _col1, 1, '0001-01-01'
                        name: lag
                        window function: GenericUDAFLagEvaluator
                        window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
                        isPivotResult: true
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), lag_window_0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.211 seconds, Fetched: 61 row(s)
hive> explain select
    >   user_id
    >   ,login_date
    >   ,max(login_date) over(partition by user_id ) prev_date
    > from test_over;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: test_over
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Reduce Output Operator
              key expressions: user_id (type: string)
              sort order: +
              Map-reduce partition columns: user_id (type: string)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              value expressions: login_date (type: string)
      Execution mode: vectorized
      Reduce Operator Tree:
        Select Operator
          expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col0 ASC NULLS FIRST
                  partition by: _col0
                  raw input shape:
                  window functions:
                      window function definition
                        alias: max_window_0
                        arguments: _col1
                        name: max
                        window function: GenericUDAFMaxEvaluator
                        window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: _col0 (type: string), _col1 (type: string), max_window_0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 3.278 seconds, Fetched: 61 row(s)

lateral view 执行计划

spark

spark-sql> 
         > explain select
         >   user_id
         >   ,login_date
         >   ,single_num
         > from test_over
         > lateral view explode(split(login_date,'-')) tmp as single_num;
== Physical Plan ==
Generate explode(split(login_date#58, -)), [user_id#57, login_date#58], false, [single_num#59]
+- Scan hive default.test_over [user_id#57, login_date#58], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#57, login_date#58]
Time taken: 0.103 seconds, Fetched 1 row(s)
22/03/10 14:39:38 INFO [main] SparkSQLCLIDriver: Time taken: 0.103 seconds, Fetched 1 row(s)

hive

hive> 
    > explain select
    >   user_id
    >   ,login_date
    >   ,single_num
    > from test_over
    > lateral view explode(split(login_date,'-')) tmp as single_num;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: test_over
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          Lateral View Forward
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Select Operator
              expressions: user_id (type: string), login_date (type: string)
              outputColumnNames: user_id, login_date
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Lateral View Join Operator
                outputColumnNames: _col0, _col1, _col5
                Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  ListSink
            Select Operator
              expressions: split(login_date, '-') (type: array<string>)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              UDTF Operator
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                function name: explode
                Lateral View Join Operator
                  outputColumnNames: _col0, _col1, _col5
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Select Operator
                    expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
                    outputColumnNames: _col0, _col1, _col2
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    ListSink

Time taken: 0.081 seconds, Fetched: 41 row(s)

大数据最新文章

实现Kafka至少消费一次

亚马逊云科技：还在苦于ETL？Zero ETL的时代

初探MapReduce

【SpringBoot框架篇】32.基于注解+redis实现

Elasticsearch：如何减少 Elasticsearch 集

Go redis操作

Redis面试题

专题五 Redis高并发场景

基于GBase8s和Calcite的多数据源查询

Redis——底层数据结构原理

加:2022-03-11 22:17:06 更:2022-03-11 22:18:24

360图书馆购物三丰科技阅读网日历万年历 2026年2日历

-2026/2/13 20:50:23-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码