Tuesday, June 14, 2011

Quickly Improve SQL Performance with dbms_stats

Oracle Tips by Burleson Consulting

Updated for 11g on September 17, 2010

The old fashioned “analyze table” and dbms_utility methods for generating CBO statistics are obsolete and somewhat dangerous to SQL performance. This is because the cost-based SQL Optimizer (CBO) relies on the quality of the statistics to choose the best execution plan for all SQL statements. The dbms_stats utility does a far better job in estimating statistics, especially for large partitioned tables, and the better stats results in faster SQL execution plans.

Let’s see how dbms_stats works. It’s easy! Here is a sample execution of dbms_stats with the options clause:

exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size repeat', -
degree => 34 -
)
When the options clause is specified you may specify GATHER options. When GATHER AUTO is specified, the only additional valid parameters are ownname, stattab, statid, objlist and statown; all other parameter settings are ignored.

exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
options => 'GATHER AUTO'
)
There are several values for the options parameter that we need to know about:

gather – re-analyzes the whole schema.

gather empty – Only analyze tables that have no existing statistics.

gather stale – Only re-analyze tables with more than 10% modifications (inserts, updates, deletes).

gather auto – This will re-analyze objects which currently have no statistics and objects with stale statistics. Using gather auto is like combining gather stale and gather empty.

exec dbms_stats.gather_schema_stats( ownname => 'BILIING', options => 'gather auto' );

Note that both gather stale and gather auto require monitoring. If you issue the “alter table xxx monitoring” command, Oracle tracks changed tables with the dba_tab_modifications view. Below we see that the exact number of inserts, updates and deletes are tracked since the last analysis of statistics.

SQL> desc dba_tab_modifications;

Name Type
--------------------------------
TABLE_OWNER VARCHAR2(30)
TABLE_NAME VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
INSERTS NUMBER
UPDATES NUMBER
DELETES NUMBER
TIMESTAMP DATE
TRUNCATED VARCHAR2(3)
The most interesting of these options is the gather stale option. Because all statistics will become stale quickly in a robust OLTP database, we must remember the rule for gather stale is > 10% row change (based on num_rows at statistics collection time).


Hence, almost every table except read-only tables will be re-analyzed with the gather stale option. Hence, the gather stale option is best for systems that are largely read-only. For example, if only 5% of the database tables get significant updates, then only 5% of the tables will be re-analyzed with the “gather stale” option.

The CASCADE Option

When analyzing specific tables, the cascade option can be used to analyze all related indexes.

The Oracle documentation notes that using the cascade option gathers statistics on the table, plus all indexes for the target table. Using this option is equivalent to running gather_table_stats plus running gather_index_stats for each index on the table. If you always want indexes analyzed when running gather_table_stats you can use the set_database_prefs, set_global_prefs, or set_table_prefs, to always include indexes when gather_table_stats is executed.

exec dbms_stats.gather_table_stats( -
ownname => 'PERFSTAT', -
tabname => ’STATS$SNAPSHOT’ -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size skewonly', -
cascade => true, -
degree => 7 -
)
The DEGREE Option

Note that you can also parallelize the collection of statistics because the CBO does full-table and full-index scans. When you set degree=x, Oracle will invoke parallel query slave processes to speed up table access. Degree is usually about equal to the number of CPUs, minus 1 (for the OPQ query coordinator).

Automating Sample Size with dbms_stats

Now that we see how the dbms_stats options works, get see how to specify the sample size for dbms_stats. The following estimate_percent argument is a new way to allow Oracle’s dbms_stats to automatically estimate the “best” percentage of a segment to sample when gathering statistics:

estimate_percent => dbms_stats.auto_sample_size

You can verify the accuracy of the automatic statistics sampling by looking at the dba_tables sample_size column. It is interesting to note that Oracle chooses between 5% to 20% for a sample_size when using automatic sampling.

In our next installment we will look at automatics the collection of histogram data from dbms_stats.

11g Update: Oracle guru Guy Harrison also offers this advice for 11g statistics collection on function-based index columns.

In 11g, I think there are two other ways to get statistics collected for indexed expressions:

1) Collect extended statistics directly on the expression. So for instance, if we had a function SALES_CATEGORY, we might do this:
DBMS_STATS.gather_table_stats
(ownname => USER,
tabname => ‘SALES’,
method_opt => ‘FOR ALL COLUMNS FOR COLUMNS
(sale_category(amount_sold))’ );

2) Create a virtual column on the expression, then index that column. So for the same example as above we might create the following virtual column, then index the column and collect stats as usual:
ALTER TABLE
SALES
ADD
sales_category
GENERATED ALWAYS AS
(sale_category(amount_sold));

I think I like the first method better, because the statistics will still exist even if the index is dropped and – unlike the second approach – it doesn’t change the logical structure of the table.
Arup Nanda has a great article on extended statistics with dbms_stats, specialty histogram analysis using function-based columnar data:

Next, re-gather statistics on the table and collect the extended statistics on the expression upper(cust_name).

begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'CUSTOMERS',
method_opt => 'for all columns size skewonly for columns (upper(cust_name))'
);
end;

Alternatively you can define the column group as part of the gather statistics command.
You do that by placing these columns in the method_opt parameter of the gather_table_stats procedure in dbms_stats as shown below:

begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'BOOKINGS',
estimate_percent=> 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS(HOTEL_ID,RATE_CATEGORY)',
cascade => true
);
end;

No comments: