[oracle@MaxwellDBA admin]$ sqlplus sys/sys as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Tue Oct 18 07:13:55 2022
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
sys@cdb$root:orclcdb>
sys@cdb$root:orclcdb> select * from v$version where rownum=1;
BANNER BANNER_FULL
-------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------
BANNER_LEGACY CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production 0
1 row selected.
sys@cdb$root:orclcdb> show user
USER is "SYS"
sys@cdb$root:orclcdb> grant dba to scott;
grant dba to scott
*
ERROR at line 1:
ORA-01917: user or role 'SCOTT' does not exist
sys@cdb$root:orclcdb> ALTER SESSION SET CONTAINER=ORCLPDB1;
Session altered.
sys@cdb$root:orclcdb> grant dba to SCOTT;
Grant succeeded.
sys@cdb$root:orclcdb> conn SCOTT/TIGER@ORCLPDB1;
Connected.
scott@orclpdb1:orclcdb> create table test as select * from dba_objects;
Table created.
scott@orclpdb1:orclcdb>
概念1.基数(CARDINALITY)
某个列唯一键(Distinct Keys)的数量叫作基数。
主键列的基数等于表的总行数。基数的高低影响列的数据分布。
返回表中5%以内的数据走索引,返回超过5%的数据走全表扫描。
scott@orclpdb1:orclcdb> select count(*) from test where owner = 'SYS';
COUNT(*)
----------
52430
1 row selected.
scott@orclpdb1:orclcdb> select 52430/73516*100 "percent" from dual;
percent
----------
71.3178084
1 row selected.
scott@orclpdb1:orclcdb> select count(*) from test where owner = 'SCOTT';
COUNT(*)
----------
30
1 row selected.
scott@orclpdb1:orclcdb> select 30/73516*100 "percent" from dual;
percent
----------
.040807443
1 row selected.
scott@orclpdb1:orclcdb>
如何查看数据分布:
select column_name, count(*) from table_name group by column_name order by 2 desc;
概念2: 选择性(Selectivity)
基数/总函数 * 100% 就是某个列的选择性
为了查看选择性,必须先收集统计信息。
以下为收集统计信息的脚本:
BEGIN
DBMS_STATS.gather_table_stats(ownname => 'SCOTT',
tabname => 'TEST',
estimate_percent => 100,
method_opt => 'for all columns size 1',
no_invalidate => FALSE,
degree => 1,
cascade => TRUE);
END;
/
查看表中每个列的基数与选择性:
/*查看表中列的基数与选择性*/
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct/ b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a,dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'SCOTT'
and a.table_name = 'TEST';

优化思想1: 什么样的列必须要创建索引呢? 当一个列出现在where 条件中,该列没有创建索引并且选择性大于20%,那么该列就必须创建索引,从而优化SQL查询性能。
SQL优化的核心观点1: 只有大表才会产生性能问题。
抓住必须创建索引的列(调整后,可用于prod environment)
抓出表中where条件中的列可通过以下两种方法:
select r.name owner,
o.name table_name,
c.name column_name,
equality_preds, ---等值过滤
equijoin_preds, ---等值JOIN 比如 where a.id = b.id
nonequijoin_preds, ---不等JOIN
range_preds, --- 范围过滤次数 > >= < <= between and
like_preds, --- LIKE 过滤
null_preds, --- NULL 过滤
timestamp
from sys.col_usage$ u,sys.obj$ o, sys.col$ c, sys.user$ r
where o.obj# = u.obj#
and c.obj# = u.obj#
and c.col# = u.intcol#
and r.name = 'SCOTT'
and o.name = 'TEST';
自动化脚本抓取必须要建索引的列:
select owner,
column_name,
num_rows,
Cardinality,
Selectivity,
'Need index' as notice
from (select b.owner,
a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'SCOTT'
and a.table_name = 'TEST')
where selectivity >= 20
and column_name not in (select column_name
from dba_ind_columns
where table_owner = 'SCOTT'
and table_name = 'TEST')
and column_name in
(select c.name
from sys.col_usage$ u, sys.obj$ o, sys.col$ c, sys.user$ r
where o.obj# = u.obj#
and c.obj# = u.obj#
and c.col# = u.intcol#
and r.name = 'SCOTT'
and o.name = 'TEST')
概念3 直方图(Histogram)
如果没有对基数低的列收集直方图统计信息,基于成本的优化器(CBO)会认为该列数据分布是均衡的。
/*收集统计信息 for all columns size 1 表示对所有列都不收集直方图*/
BEGIN
DBMS_STATS.gather_table_stats(ownname => 'SCOTT',
tabname => 'TEST',
estimate_percent => 100,
method_opt => 'for all columns size 1',
no_invalidate => FALSE,
degree => 1,
cascade => TRUE);
END;
/
Histogram 为none表示没有收集直方图
-------------查看表中每个列的基数和选择性---------------------------------
select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct/ b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a,dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'SCOTT'
and a.table_name = 'TEST';

?
sys@cdb$root:orclcdb> conn SCOTT/TIGER@ORCLPDB1;
Connected.
scott@orclpdb1:orclcdb> set autot trace
scott@orclpdb1:orclcdb>
scott@orclpdb1:orclcdb> select * from test where owner='SCOTT';
30 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1935 | 249K| 398 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 1935 | 249K| 398 (1)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1432 consistent gets
0 physical reads
0 redo size
4947 bytes sent via SQL*Net to client
411 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
30 rows processed
scott@orclpdb1:orclcdb> create index idx_owner on test(owner);
Index created.
scott@orclpdb1:orclcdb> select * from test where owner='SCOTT';
30 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3011311971
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1935 | 249K| 57 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST | 1935 | 249K| 57 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OWNER | 1935 | | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
10 consistent gets
1 physical reads
0 redo size
6244 bytes sent via SQL*Net to client
411 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
30 rows processed
scott@orclpdb1:orclcdb> select * from test where owner='SYS';
52430 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3011311971
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1935 | 249K| 57 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST | 1935 | 249K| 57 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OWNER | 1935 | | 5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8218 consistent gets
111 physical reads
0 redo size
8096719 bytes sent via SQL*Net to client
38843 bytes received via SQL*Net from client
3497 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
52430 rows processed
scott@orclpdb1:orclcdb> set autot off
scott@orclpdb1:orclcdb> select round(73516/38) from dual;
ROUND(73516/38)
---------------
1935
1 row selected.
scott@orclpdb1:orclcdb>
执行计划里面的Rows是假的。在做SQL优化的时候,经常需要做的工作就是帮助CBO计算比较准确的Rows.
如果CBO每次都能计算得到精确的Rows,那么我们只需要关心业务逻辑、表设计、SQL写法以及如果建立索引了。不用担心SQL会走错执行计划了。
scott@orclpdb1:orclcdb>
scott@orclpdb1:orclcdb>
scott@orclpdb1:orclcdb> BEGIN
2 DBMS_STATS.gather_table_stats(ownname => 'SCOTT',
tabname => 'TEST',
4 estimate_percent => 100,
5 method_opt => 'for all columns size skewonly',
6 no_invalidate => FALSE,
degree => 1,
8 cascade => TRUE);
9 END;
10 /
PL/SQL procedure successfully completed.
scott@orclpdb1:orclcdb>
scott@orclpdb1:orclcdb> select a.column_name,
2 b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5 a.histogram,
6 a.num_buckets
from dba_tab_col_statistics a, dba_tables b
8 where a.owner = b.owner
9 and a.table_name = b.table_name
10 and a.owner = 'SCOTT'
11 and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
-------------------------------------------------------------------------------------------------------------------------------- ---------- ----------- ----------- --------------- -----------
OWNER 73516 38 .05 FREQUENCY 38
OBJECT_NAME 73516 61420 83.55 HEIGHT BALANCED 254
SUBOBJECT_NAME 73516 417 .57 HEIGHT BALANCED 254
OBJECT_ID 73516 73514 100 NONE 1
DATA_OBJECT_ID 73516 6585 8.96 HEIGHT BALANCED 254
OBJECT_TYPE 73516 47 .06 FREQUENCY 47
CREATED 73516 990 1.35 HEIGHT BALANCED 254
LAST_DDL_TIME 73516 1260 1.71 HEIGHT BALANCED 254
TIMESTAMP 73516 1178 1.6 HEIGHT BALANCED 254
STATUS 73516 2 0 FREQUENCY 2
TEMPORARY 73516 2 0 FREQUENCY 2
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
-------------------------------------------------------------------------------------------------------------------------------- ---------- ----------- ----------- --------------- -----------
GENERATED 73516 2 0 FREQUENCY 2
SECONDARY 73516 2 0 FREQUENCY 2
NAMESPACE 73516 23 .03 FREQUENCY 23
EDITION_NAME 73516 0 0 NONE 0
SHARING 73516 4 .01 FREQUENCY 4
EDITIONABLE 73516 2 0 FREQUENCY 2
ORACLE_MAINTAINED 73516 2 0 FREQUENCY 2
APPLICATION 73516 1 0 FREQUENCY 1
DEFAULT_COLLATION 73516 1 0 FREQUENCY 1
DUPLICATED 73516 1 0 FREQUENCY 1
SHARDED 73516 1 0 FREQUENCY 1
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
-------------------------------------------------------------------------------------------------------------------------------- ---------- ----------- ----------- --------------- -----------
CREATED_APPID 73516 0 0 NONE 0
CREATED_VSNID 73516 0 0 NONE 0
MODIFIED_APPID 73516 0 0 NONE 0
MODIFIED_VSNID 73516 0 0 NONE 0
26 rows selected.
scott@orclpdb1:orclcdb> set autot trace
scott@orclpdb1:orclcdb> select * from test where owner='SCOTT';
30 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3011311971
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30 | 3960 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST | 30 | 3960 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OWNER | 30 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
10 consistent gets
0 physical reads
0 redo size
6244 bytes sent via SQL*Net to client
411 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
30 rows processed
scott@orclpdb1:orclcdb> select * from test where owner='SYS';
52430 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52430 | 6758K| 399 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 52430 | 6758K| 399 (1)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4856 consistent gets
0 physical reads
0 redo size
3453492 bytes sent via SQL*Net to client
38843 bytes received via SQL*Net from client
3497 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
52430 rows processed
scott@orclpdb1:orclcdb>
直方图是用来帮助CBO在对基数很低、数据分布不均衡的列进行Rows估算的时候,可以得到更精确的Rows就够了。
什么样的列需要收集直方图呢?当列出现在where条件中,列的选择性小于1%并且该列没有收集过直方图,这样的列就应该收集直方图。
注意:千万不能对没有出现在where条件中的列收集直方图。对没有出现在where条件中的列收集直方图完全是做无用功,浪费数据库资源。
抓出必须创建直方图的列:
select a.owner,
a.table_name,
a.column_name,
b.num_rows,
a.num_distinct Cardinality,
trunc(a.num_distinct / b.num_rows * 100, 2) selectivity,
'Need Gather Histogram' notice
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.owner = 'SCOTT'
and a.table_name = 'TEST'
and a.table_name = b.table_name
and num_distinct / num_rows < 0.01
and (a.owner, a.table_name, a.column_name) in
(select r.name owner, o.name table_name, c.name column_name
from sys.col_usage$ u, sys.obj$ o, sys.col$ c, sys.user$ r
where o.obj# = u.obj#
and c.obj# = u.obj#
and c.col# = u.intcol#
and r.name = 'SCOTT'
and o.name = 'TEST')
and a.histogram = 'NONE';
概念4 回表(TABLE ACCESS BY INDEX ROWID)
当对一个列创建索引之后,索引会包含该列的键值以及键值所对应行所在的rowid.
回表:通过索引中记录的rowid访问回表中的数据
回表属于单块读,回表次数太多会严重影响SQL性能,如果回表次数太多,就不应该走索引扫描,应该走全表扫描。
在进行SQL优化的时候,一定要注意回表次数,特别是要注意回表的物理I/O次数。
scott@orclpdb1:orclcdb> select * from test where owner='SYS';
52430 rows selected.
Execution Plan ---------------------------------------------------------- Plan hash value: 3011311971
------------------------------------------------------------------------------------------------- | Id? | Operation?????????????????????????? | Name????? | Rows? | Bytes | Cost (%CPU)| Time???? | ------------------------------------------------------------------------------------------------- |?? 0 | SELECT STATEMENT??????????????????? |?????????? |? 1935 |?? 249K|??? 57?? (0)| 00:00:01 | |?? 1 |? TABLE ACCESS BY INDEX ROWID BATCHED| TEST????? |? 1935 |?? 249K|??? 57?? (0)| 00:00:01 | |*? 2 |?? INDEX RANGE SCAN????????????????? | IDX_OWNER |? 1935 |?????? |???? 5?? (0)| 00:00:01 | -------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id): ---------------------------------------------------
?? 2 - access("OWNER"='SYS')
Statistics ---------------------------------------------------------- ????????? 1? recursive calls ????????? 0? db block gets ?????? 8218? consistent gets ??????? 111? physical reads ????????? 0? redo size ??? 8096719? bytes sent via SQL*Net to client ????? 38843? bytes received via SQL*Net from client ?????? 3497? SQL*Net roundtrips to/from client ????????? 0? sorts (memory) ????????? 0? sorts (disk) ????? 52430? rows processed
scott@orclpdb1:orclcdb>
?TABLE ACCESS BY INDEX ROWID BATCHED 就是回表,索引返回多少行数据,回表就要回多少次,每次回表都是单块读。该SQL语句返回了52430行数据,那么回表就回了52430行数据。
?SQL的性能确实大部分损失在回表中,是否能回答返回表中5%内的数据走索引,超过表中5%的数据走全表扫描?根本原因在于回表。
如何消除回表?
通过建立组合索引
概念5 集群因子(CLUSTER FACTOR)
集群因子用于判断索引回表需要消耗的物理I/O次数。
当索引范围扫描,索引全扫描不回表或者返回数据量很少的时候,不管集群因子多大,对SQL查询性能几乎没有任何影响。再次强调一遍,在进行SQL优化的时候,往往回建立合适的组合索引消除回表。或者建立组合索引尽量减少回表次数。
|