建表语句
create table test_over
(
user_id string,
login_date string
) COMMENT '测试函数使用,可以删除'
row format delimited
fields terminated by '\t';
over 执行计划
spark
spark-sql> explain select
> user_id
> ,login_date
> ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
> from test_over;
22/03/10 10:55:50 INFO [main] CodeGenerator: Code generated in 9.641436 ms
== Physical Plan ==
Window [lag(login_date#34, 1, 0001-01-01) windowspecdefinition(user_id#33, login_date#34 ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS prev_date#30], [user_id#33], [login_date#34 ASC NULLS FIRST]
+- *(1) Sort [user_id#33 ASC NULLS FIRST, login_date#34 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(user_id#33, 200)
+- Scan hive default.test_over [user_id#33, login_date#34], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#33, login_date#34]
Time taken: 0.098 seconds, Fetched 1 row(s)
22/03/10 10:55:50 INFO [main] SparkSQLCLIDriver: Time taken: 0.098 seconds, Fetched 1 row(s)
spark-sql>
> explain
> select
> user_id
> ,login_date
> ,first_value(login_date) over(partition by user_id ) prev_date
> from test_over;
== Physical Plan ==
Window [first(login_date#39, false) windowspecdefinition(user_id#38, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#35], [user_id#38]
+- *(1) Sort [user_id#38 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(user_id#38, 200)
+- Scan hive default.test_over [user_id#38, login_date#39], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#38, login_date#39]
Time taken: 0.077 seconds, Fetched 1 row(s)
22/03/10 10:57:34 INFO [main] SparkSQLCLIDriver: Time taken: 0.077 seconds, Fetched 1 row(s)
spark-sql>
>
> explain select
> user_id
> ,login_date
> ,max(login_date) over(partition by user_id ) prev_date
> from test_over;
== Physical Plan ==
Window [max(login_date#45) windowspecdefinition(user_id#44, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#41], [user_id#44]
+- *(1) Sort [user_id#44 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(user_id#44, 200)
+- Scan hive default.test_over [user_id#44, login_date#45], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#44, login_date#45]
Time taken: 0.081 seconds, Fetched 1 row(s)
22/03/10 10:58:15 INFO [main] SparkSQLCLIDriver: Time taken: 0.081 seconds, Fetched 1 row(s)
hive
hive> explain select
> user_id
> ,login_date
> ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date
> from test_over;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: test_over
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: user_id (type: string), login_date (type: string)
sort order: ++
Map-reduce partition columns: user_id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Execution mode: vectorized
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
PTF Operator
Function definitions:
Input definition
input alias: ptf_0
output shape: _col0: string, _col1: string
type: WINDOWING
Windowing table definition
input alias: ptf_1
name: windowingtablefunction
order by: _col1 ASC NULLS FIRST
partition by: _col0
raw input shape:
window functions:
window function definition
alias: lag_window_0
arguments: _col1, 1, '0001-01-01'
name: lag
window function: GenericUDAFLagEvaluator
window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
isPivotResult: true
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), lag_window_0 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.211 seconds, Fetched: 61 row(s)
hive> explain select
> user_id
> ,login_date
> ,max(login_date) over(partition by user_id ) prev_date
> from test_over;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: test_over
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: user_id (type: string)
sort order: +
Map-reduce partition columns: user_id (type: string)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: login_date (type: string)
Execution mode: vectorized
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
PTF Operator
Function definitions:
Input definition
input alias: ptf_0
output shape: _col0: string, _col1: string
type: WINDOWING
Windowing table definition
input alias: ptf_1
name: windowingtablefunction
order by: _col0 ASC NULLS FIRST
partition by: _col0
raw input shape:
window functions:
window function definition
alias: max_window_0
arguments: _col1
name: max
window function: GenericUDAFMaxEvaluator
window frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), max_window_0 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 3.278 seconds, Fetched: 61 row(s)
lateral view 执行计划
spark
spark-sql>
> explain select
> user_id
> ,login_date
> ,single_num
> from test_over
> lateral view explode(split(login_date,'-')) tmp as single_num;
== Physical Plan ==
Generate explode(split(login_date#58, -)), [user_id#57, login_date#58], false, [single_num#59]
+- Scan hive default.test_over [user_id#57, login_date#58], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#57, login_date#58]
Time taken: 0.103 seconds, Fetched 1 row(s)
22/03/10 14:39:38 INFO [main] SparkSQLCLIDriver: Time taken: 0.103 seconds, Fetched 1 row(s)
hive
hive>
> explain select
> user_id
> ,login_date
> ,single_num
> from test_over
> lateral view explode(split(login_date,'-')) tmp as single_num;
OK
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: test_over
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Lateral View Forward
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: user_id (type: string), login_date (type: string)
outputColumnNames: user_id, login_date
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col5
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink
Select Operator
expressions: split(login_date, '-') (type: array<string>)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
UDTF Operator
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
function name: explode
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col5
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink
Time taken: 0.081 seconds, Fetched: 41 row(s)
|