Oracle and DB2, Comparison and Compatibility/Process Model/Optimizer/Oracle
The Oracle Cost Based Optimizer (CBO) uses previously gathered statistics to create the most efficient (optimal) SQL execution plan. These statistics are stored in the data dictionary in the system tablespace. You will recall that the system tablespace cannot be taken offline, and one of the reasons is that it contains the data dictionary which is a central component of the DBMS’s operation. It is possible however to store statistics outside the data dictionary, and in this case they will not affect the optimizer. It is also possible to copy statistics between databases to seed the optimizer.
The Oracle Optimizer (CBO) is responsible for making decisions about how to execute SQL statements. The objective is to execute them in the most efficient way. The optimizer works by weighing all the information it has at its disposal and coming up with an execution plan based on this information. Some of the things available to the optimizer are from the environment – these are things about the data, indexes and machine resources available. Other information is supplied by you based on business need. These things include how you want to utilize machine resources and how quickly you want data returned. Many of the factors that you provide to the optimizer can be done at either the database level, at the session level or even at the SQL statement level.
We will look at 4 areas that affect the optimizer; the parameters you supply it with, the statistics generated by the database, the data SQL, and optimizer hints.
There are many optimizer parameters that you can tune, but the most influential is OPTIMIZER_MODE. As mentioned earlier, Oracle has b-tree and bitmap indexes. Indexes are performance structures, and index access is generally quicker than a table scan especially if the result set is small and the data is clustered (See Clustering).
For a large result set that needs to be sorted, index access may need more I/O than a table scan. This is less efficient, but will return rows to the user as soon as it has them. Later on, in Statistics we will see the type of information available to the optimizer from the database operation (Statistics), and the environment (Machine resources, data distribution and SQL). These are things that the optimizer ‘knows’, what it does not ‘know’ is your preference for how you would like data returned to you. The OPTIMIZER_MODE is where you specify your preference.
The all_rows optimizer mode favors table scans over index scans. It is designed to optimize machine resources. If the optimizer is set to all-rows, tables will be scanned and result sets will be written to intermediate temporary tables. Only when the operations has finished will the results be returned to the end user. This mode of execution minimizes I/O, and is best suited for OLAP and DSS type systems, where large amounts of data typically need to be processed, but it is not important to return data to the end user as quickly as possible. This is the case where OLAP / DSS systems have long running analytical queries that are typically run in a batch like environment.
The first_rows optimizer mode will use index scans over table scans. Because it is reading the index to locate the row, and then reading the row, it is I/O heavy, however as soon as the data is read, it is returned to the originating query. This mode favors OLTP type applications, where typically not as much data is being processed, but the user is sitting at a terminal waiting for a response.
- Other parameters
There are other parameters that you can alter to give the optimizer cues to your preferences or how your data is laid out. In general these don’t need altering, but the option is available should you need it. These parameters allow you to influence the cost weighting for such things as:
- Sort area size (a big sort area means that it’s more likely that the sort will fit in memory)
- Access paths involving indexes
- Whether to parallelize table scans
- Hash joins VS nested loop and sort merge table joins
- How much of the index is likely to be in the buffer cache
- Weighting of scattered reads over sequential reads
You use the DBMS_STATS package or the ANALYZE statement to gather statistics about the physical storage characteristics of a table, index, or cluster. The DBMS_STATS package is recommended over ANALYZE for gathering optimizer statistics, but for statistics unrelated to the optimizer, such as empty blocks and average space you have to use ANALYZE.
Optimizer statistics are collected with the following DBMS_STATS procedures
When you use ANALYZE, it takes an exclusive lock on the object being analyzed, so analysis of tables should be done during off peak hours. There are two methods of gathering statistics, Estimate and Exact. With Estimate, the optimizer samples random rows from a table and uses this information to estimate statistics for the whole table. This is the quickest method of analysis and is useful for particularly large tables. With the Exact method, the optimizer analyzes every row in the table. While this method is more accurate, it takes longer. Since the statistical information gathered is stored in the data dictionary (and the data dictionary is always available), there are a number of data dictionary views (prefaced with DBA_) that can be used to verify the accuracy of these statistics.
Another advantage of dbms_stats is that it can be used to identify skewed data distributions.
A great deal of the information that the optimizer needs to do its job come from the operating system and the operation of the database itself (the environment, if you will), these are such things as
- Size and partitioning of a table
- Number of distinct values within each table column
- Distribution of column values (data skew)
- Availability of parallel query servers
To override the optimizer you place ‘Hints’ in your SQL and/or PL/SQL. Hints can be placed in any block of SQL or PL/SQL - anonymous blocks, stored procedures and triggers. There are various methods for fine tuning the optimizer, and these are options for such instances as when the data in a table is changing faster than is reflected in the statistics. One method of tuning is hints, another is to lock the statistics when they look optimal. This is one of those areas of ‘optimal’ vs ‘acceptable’. If your optimizer performance is acceptable, there’s no need to tune it. If you do need to tune it, locking statistics or hints can help, but again it is usually a function of getting performance to be acceptable. Striving for optimal performance can quickly lead to diminishing returns. There are over 130 different hints that you can use to influence the optimizer, giving you a fine degree of control. You can influence such things as join order, access paths, join operations and parallel paths.