dbms_stats
NinGoo's blog

如何收集及删除列的统计信息

本文只涉及使用dbm_stats来收集或删除列的统计信息的一些命令,以备查询。

测试表如下(实验环境为10.2.0.4):

SQL> create table test(i int,a varchar2(30));
Table created.

SQL> insert into test select rownum,object_name from all_objects;
9907 rows created.

简单的说,列的统计信息,主要包括两种类型:

  • 只有基本信息:收集的统计信息只有1个桶(bucket)
  • 包含柱状图信息:收集的统计信息包含2到254个桶

也就是说,如果想收集列的基本信息,同时不希望收集柱状图,则需要指定bucket的size为1:

SQL> exec dbms_stats.gather_table_stats(user, 'TEST',
cascade=>false, method_opt=>'for columns i size 1');

PL/SQL procedure successfully completed.                                                                             

SQL> select column_name,num_distinct,num_buckets,low_value,high_value,density,avg_col_len,histogram
  2  from user_tab_columns where table_name='TEST';

COLUMN_NAM NUM_DISTINCT NUM_BUCKETS LOW_VALUE  HIGH_VALUE    DENSITY AVG_COL_LEN HISTOGRAM
---------- ------------ ----------- ---------- ---------- ---------- ----------- ---------------
I                  9907           1 C102       C26408     .000100939           4 NONE
A                                                                                NONE

如果要收集列的柱状图信息,则bucket的个数必须大于等于2(最多不超过254)

SQL> exec dbms_stats.gather_table_stats(user, 'TEST',
cascade=>false, method_opt=>'for columns i size 2');

PL/SQL procedure successfully completed.

SQL> select column_name,num_distinct,num_buckets,low_value,high_value,density,avg_col_len,histogram
  2  from user_tab_columns where table_name='TEST';

COLUMN_NAM NUM_DISTINCT NUM_BUCKETS LOW_VALUE  HIGH_VALUE    DENSITY AVG_COL_LEN HISTOGRAM
---------- ------------ ----------- ---------- ---------- ---------- ----------- ---------------
I                  9907           2 C102       C26408     .000100939           4 HEIGHT BALANCED
A                                                                                NONE

如果要删除列已有的柱状图信息而保留列的基本统计信息,则需要重新收集bucket为1的统计信息

SQL> exec dbms_stats.gather_table_stats(user, 'TEST',
cascade=>false, method_opt=>'for columns i size 1');

PL/SQL procedure successfully completed.

SQL> select column_name,num_distinct,num_buckets,low_value,high_value,density,avg_col_len,histogram
  2  from user_tab_columns where table_name='TEST';

COLUMN_NAM NUM_DISTINCT NUM_BUCKETS LOW_VALUE  HIGH_VALUE    DENSITY AVG_COL_LEN HISTOGRAM
---------- ------------ ----------- ---------- ---------- ---------- ----------- ---------------
I                  9907           1 C102       C26408     .000100939           4 NONE
A                                                                                NONE

这个操作明显不太合理,重新收集统计信息的代价有时候是很大的,所以Oracle11g对此做出了改进,允许只删除柱状图而保留基本统计信息,命令语法如下:

exec dbms_stats.delete_column_stats('', '
', '
', col_stat_type=>'HISTOGRAM');

而要彻底删除整个列的统计信息,则需要调用delete_column_stats过程

SQL> exec dbms_stats.delete_column_stats(user, 'TEST', 'I');

PL/SQL procedure successfully completed.

SQL> select column_name,num_distinct,num_buckets,low_value,high_value,density,avg_col_len,histogram
  2  from user_tab_columns where table_name='TEST';

COLUMN_NAM NUM_DISTINCT NUM_BUCKETS LOW_VALUE  HIGH_VALUE    DENSITY  AVG_COL_LEN HISTOGRAM
---------- ------------ ----------- ---------- ---------- ----------  ----------- ---------------
I                                                                                 NONE
A                                                                                 NONE

可以在同一个过程中收集多个列的统计信息,并且可以为不同的列指定不同的bucket个数:

SQL> exec dbms_stats.gather_table_stats(user, 'TEST',
cascade=>false, method_opt => 'for columns size 1 T for columns size 2 A');

PL/SQL procedure successfully completed.

SQL> select column_name,num_distinct,num_buckets,low_value,high_value,density,avg_col_len,histogram
  2  from user_tab_columns where table_name='TEST';

COLUMN_NAM NUM_DISTINCT NUM_BUCKETS LOW_VALUE  HIGH_VALUE              DENSITY AVG_COL_LEN HISTOGRAM
---------- ------------ ----------- ---------- -------------------- ---------- ----------- ---------------
I                  9907           1 C102       C26408               .000100939           4 NONE
A                  7376           2 41         5F75746C245F6C6E635F .000185239          18 HEIGHT BALANCED
                                               696E645F7061727473

值得注意的是,从上一篇文章我们知道,9i的dbms_stats中,method_opt的默认值是FOR ALL COLUMNS SIZE 1,也就是收集列的基本统计信息而不收集柱状图信息,而10g的默认值则变成了FOR ALL COLUMNS SIZE AUTO,则Oracle在收集列的基本信息之外,还会根据情况收集某些列的柱状图。

dbms_stats收集统计信息时的默认参数

利用dbms_stats来收集统计信息的时候,其procedure有很多参数是有默认值的。9i的默认值是直接写死在procedure的定义中,而10g的默认值则是一组可以由用户设置的参数值,为此,10g的dbms_stats引入了两个新的procedure:get_param和set_param。

$ORACLE_HOME/rdbms/admin/dbmsstat.sql里可以看到gather_table_stats过程的具体参数默认值如下:

9i:

procedure gather_table_stats
    (ownname varchar2,
     tabname varchar2,
     partname varchar2 default null,
     estimate_percent number default null,
     block_sample boolean default FALSE,
     method_opt varchar2 default 'FOR ALL COLUMNS SIZE 1',
     degree number default null,
     granularity varchar2 default 'DEFAULT',
     cascade boolean default FALSE,
     stattab varchar2 default null,
     statid varchar2 default null,
     statown varchar2 default null,
     no_invalidate boolean default FALSE);

10g:

procedure gather_table_stats
    (ownname varchar2,
     tabname varchar2,
     partname varchar2 default null,
     estimate_percent number default to_estimate_percent_type(get_param('ESTIMATE_PERCENT')),
     block_sample boolean default FALSE,
     method_opt varchar2 default get_param('METHOD_OPT'),
     degree number default to_degree_type(get_param('DEGREE')),
     granularity varchar2 default get_param('GRANULARITY'),
     cascade boolean default to_cascade_type(get_param('CASCADE')),
     stattab varchar2 default null, statid varchar2 default null,
     statown varchar2 default null,
     no_invalidate boolean default to_no_invalidate_type(get_param('NO_INVALIDATE')),
     stattype varchar2 default 'DATA',
     force boolean default FALSE);

可以看到,10g的procedure定义里,有些default值是通过get_param来获取的,这些默认值可以通过set_param进行修改。

SQL> select 'cascade:' as param,dbms_stats.get_param('cascade') as value from dual
  2  union all
  3  select 'degree:',dbms_stats.get_param('degree') from dual
  4  union all
  5  select 'estimate_percent:',dbms_stats.get_param('estimate_percent') from dual
  6  union all
  7  select 'method_opt:',dbms_stats.get_param('method_opt') from dual
  8  union all
  9  select 'no_invalidate:',dbms_stats.get_param('no_invalidate') from dual
 10  union all
 11  select 'granularity:',dbms_stats.get_param('granularity') from dual;

PARAM             VALUE
----------------- ------------------------------
cascade:          DBMS_STATS.AUTO_CASCADE
degree:           NULL
estimate_percent: DBMS_STATS.AUTO_SAMPLE_SIZE
method_opt:       FOR ALL COLUMNS SIZE AUTO
no_invalidate:    DBMS_STATS.AUTO_INVALIDATE
granularity:      AUTO

6 rows selected.

这些可以修改的参数可取的值,可以参考dbmsstat.sql里的注释和Metalink文档:Doc ID:725845.1