How reliable it is using DBMS_STATS.AUTO_SAMPLE_SIZE for gathering table statistics??

Statistics collection in Oracle is managed through a pl/sql package “dbms_stats”. It provides several procedures to gather statistics for a table, schema, or a database. This procedure has an estimate_percent parameter, which specifies the sampling percentage or in other words the sample of object’s data Oracle have to scan to come to an estimation on different statistics. The users can specify any number between 0 and 100 for this parameter.
For example, suppose you have a table ABCD, and you want to specify a 10% sampling percentage, you can do this:

exec dbms_stats.gather_table_stats(null, ‘ABCD’, estimate_percent => 10);

For more details on this procedure: http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_stats.htm

It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics (I’ve seen hours on some really big tables). On the contrary, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate which is even a bigger problem.

For this reason, Oracle introduced the AUTO value for the estimate_percent parameter.

For example, you can gather statistics on ABCD as follows:

exec dbms_stats.gather_table_stats(null, ‘ABCD’,

estimate_percent => dbms_stats.auto_sample_size);

There are 2  primary advantage of using AUTO sample size over a fixed number.

  1. When AUTO sample size is specified, the system automatically determines the appropriate sampling percentage.
  2. AUTO sample size is more flexible than a fixed sampling percentage. A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed. On the other hand when AUTO value is used Oracle will adjust the sample size when the data distribution changes.

When AUTO is used Oracle picks a sample size where the statistics quality is good enough. However, it does not work very well under cases of extreme skew in the data. To illustrate the accuracy of auto sample size , I compare the performance of using a fixed sampling percentage, AUTO sample size in a Oracle 11g database.

There’s a table called CUSTOMERSCSM which contains 23 columns and 6,216,000 rows. This is a table I’ve taken from the Oracle’s sample tables in SH schema (tables that comes with installation).

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => 1);

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.25

and

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => 100);

PL/SQL procedure successfully completed.

Elapsed: 00:02:52.37

and

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => dbms_stats.auto_sample_size);

PL/SQL procedure successfully completed.

Elapsed: 00:00:29.47

 

You can see there’s a huge difference between 1 and 100 percent samples whereas 1 and auto sample size are finished almost in the same time (A 18 seconds delay, to be accurate).

As timing wise we have significant gains with auto sample size, now we’ll see the accuracy part. Among all the statistics of a column, number of distinct values (NDV) is the one whose accuracy used to be an issue.

the actual distinct values in the column CUST_MAIN_PHONE_NUMBER of CUSTOMERCSM table is

SQL> select count(distinct(CUST_MAIN_PHONE_NUMBER)) from CUSTOMERSCSM;

COUNT(DISTINCT(CUST_MAIN_PHONE_NUMBER))

—————————————

                                 51000

 

Case1: With 1% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                              NUM_DISTINCT

——————————                                    ————

CUST_MAIN_PHONE_NUMBER                  49362

Case2: With100% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                               NUM_DISTINCT

——————————                                      ————

CUST_MAIN_PHONE_NUMBER                    51000

Case3: With AUTO sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                           NUM_DISTINCT

——————————                                  ————

CUST_MAIN_PHONE_NUMBER                51344

 

Estimate Percentage Actual value Value from stats % deviation
1% 51000 49362 3.21%
100% 51000 51000 0%
AUTO 51000 51344 0.67%

So percentage deviation is very less with auto sample compared to 1%

Now we’ll do the same exercise on another column called CUST_ID.

SQL> select count(distinct(CUST_ID)) from CUSTOMERSCSM;

COUNT(DISTINCT(CUST_ID))

————————

55500

Case1: With 1% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55704

Case2: With100% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55500

 

Case3: With AUTO sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55992

Estimate Percentage Actual value Value from stats % deviation
1% 55500 55704 0.37%
100% 55500 55500 0%
AUTO 55500 55992 0.89%

Here if you see, the AUTO sample size is much deviated from the 1% sample.

So, just now we’ve seen two different scenarios with 1% and auto sample sizes. So it’s reliable to depend on the auto sampling??

I’m still trying to find the answers and will post my observations soon through another article. I’ll be happy if anyone could clear the clouds even before I do it for myself Smile

Have a good day guys. Keep Contemplating.

 

regards,

CSM

Tracing DATAPUMP jobs

Hi All,

Now a days datapump has become like the ‘select * from…’ to the DBAs as we come across many situations to move the data here and there and among heterogeneous environments. So, having good knowledge on datapump utility and it’s features is quite commendable.

At times we all come across some weird problems using datapump. I’m one of the millions who thought ‘God!! how to trace my impdp/expdp?’ by doing which I can get the information about internal processing and debug the issues quickly.

God answered to my prayers and I came to know about one undocumented parameter called ‘TRACE’ which would help us in generating detailed, system level, traces for the datapump operations.

Let’s see how it works

This parameter,TRACE, takes value as 7 digit hexadecimal numbers. Specifying the parameter value follow some rules.
Out of 7 digit hexadecimal number,

– The first 3 digits are responsible to enable tracing for a specific data pump component and  last 4 digits are usually 0300

– Specifying more than 7 hexadecimal number is not allowed. Doing so will result in UDE-00014: invalid value for parameter, ‘trace’.

– Leading zero can be omitted. So the values may be less than 7 characters/digits and the values are not case sensitive.

Before starting the tracing make sure you have the apposite value set to the MAX_DUMP_FILE_SIZE initialization parameter because this parameter decides how big a trace file can grow. The default value is UNLIMITED and it’s fine to have that.

SQL> show parameter max_dump_file
NAME TYPE VALUE
———————————— ———– ——————-
max_dump_file_size string UNLIMITED

If it’s not unlimited, try setting it through the following command

alter system set max_dump_file_size=UNLIMITED;

The majority of errors that occur during a Data Pump job, can be diagnosed by creating a trace file for the Master Control Process (MCP) and the Worker Process(es) only.

In case of standard tracing trace files are generated in BACKGROUND_DUMP_DEST. In case of standard tracing,

– Master Process trace file format:  <SID>_dm<number>_<process_id>.trc

– Worker Process trace file format: <SID>_dw<number>_<process_id>.trc

In case of full tracing two trace files are generated in BACKGROUND_DUMP_DEST which are the one’s generated by standard tracing modiles and an additional trace file is generated in USER_DUMP_DEST.

Shadow Process trace file format: <SID>_ora_<process_id>.trc

In Databases where ADR is configured, the locations of the files will be like

$ADR_HOME/trace:
– Data Pump Master Control Processes files have format: SID_dm_process_id.trc
– Data Pump Worker Process trace files have format: SID_dw_process_id.trc
– Data Pump Shadow Process trace files have format: SID_ora_process_id.trc

The list of trace level in data pump is shown below.

 Trace   DM   DW  ORA  Lines
  level  trc  trc  trc     in
  (hex) file file file  trace                                         Purpose
------- ---- ---- ---- ------ -----------------------------------------------
  10300    x    x    x  SHDW: To trace the Shadow process (API) (expdp/impdp)
  20300    x    x    x  KUPV: To trace Fixed table
  40300    x    x    x  'div' To trace Process services
  80300    x            KUPM: To trace Master Control Process (MCP)      (DM)
 100300    x    x       KUPF: To trace File Manager
 200300    x    x    x  KUPC: To trace Queue services
 400300         x       KUPW: To trace Worker process(es)                (DW)
 800300         x       KUPD: To trace Data Package
1000300         x       META: To trace Metadata Package
--- +
1FF0300    x    x    x  'all' To trace all components          (full tracing)

You can use the individual values which are listed in the above table or the combination of them. If you sum all the individual values you will get 1FF0300 which is full tracing.

datapump command for full tracing while doing export is,
expdp DUMPFILE=expdp.dmp LOGFILE=expdp.log TRACE=1FF0300

datapump command for full tracing while doing import is,
impdp DUMPFILE=expdp.dmp LOGFILE=impdp.log TRACE=1FF0300

In most of the scenarios full tracing is not required as you’ll be bombarded with wealth of information which takes significant effort/time to analyze. As trace 400300 is to trace the Worker process(es) and trace 80300 is to trace Master Control Process (MCP), combining them we can use trace 480300 to trace both the Master Control process (MCP) and the Worker process(es). We need to leave last 3 digits (300) while adding them so you got 480300 instead of 480600.

Hope this helps. In next series of articles we’ll see how to enable tracing on an already running datapump process (I’m not kidding) and how to analyze the trace file generated through the above method.

Enable BCT (Block Change Tracking)

Hi,

In my previous post (https://querycsm.wordpress.com/2013/01/02/why-rman-daily-incremental-backups-are-taking-so-long/) I’ve talked about investigating the potential problems and stated that one of the improvement initiatives we could go ahead with is ‘Enabling BCT’.

Now we’ll see how to do that and are there any other things we need to perform after enabling.

First I recommend you to go through the document prepared by my favorite author Alexander Gorbachev to get an idea of what’s BCT and how it works(at least go through the first few pages).

To enable BCT we have two ways:

Specifying the BCT file location manually like this

SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE ‘/dbfile/bct_file.ora’;

or the other way of doing is

SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING;

but you need to set db_create_file_dest to the location where we want the BCT file to be created before enabling it.

SQL> alter system set db_create_file_dest=’location’ SCOPE=BOTH;

To check if the BCT is enabled or not fire the below query

SQL> select * from v$block_change_tracking;

STATUS        FILENAME                                                                        BYTES

ENABLED   /ora11dbdata2/oracle/NIKU/blk_chg_trckng.dbf    11599872

I did these steps and was not seeing any improvements in the PCT_READ(percentage of blocks read) from the below query.

SQL> select file#, avg(datafile_blocks) blocks,

2  avg(blocks_read) blocks_read,

3  avg(blocks_read/datafile_blocks)*100 pct_read,

avg(blocks) blocks_backed_up,

4    5  (avg(blocks)/avg(datafile_blocks))*100 pct_backed_up

from v$backup_datafile

6    7  where incremental_level = 1

8  group by file#

9  order by file#

So after doing lot of research, I found that we need to take a FULL backup of the Database, after enabling BCT, to leverage it’s benefits. We should do this because when we do a FULL backup, the database will create a baseline bitmap and from there it’ll start updating the bitmap based upon the blocks that are changed.

I’ve later observed that lot of people are facing similar statistics i.e. no improvement after enabling the BCT and this is the answer for that.

BTW, the performance improvement after enabling BCT in our production Database is humongous and we’re now seeing only 2%-3% of the blocks are getting read by RMAN. thanks to BCT!!!

You’re always welcome to any queries and questions.

Regards,

CSM

What if lost both the PFILE and SPFILE???

It’s been two months I didn’t write any articles. It’s been a continuous festive season, here in India, and a job change for me 🙂

Hope I’ll continue my interest with the same pace as earlier.

Loosing SPFILE and the PFILE might be a typical scenario faced by lot of DBAs (mostly the junior DBAs): Loosing PFILE and SPFILE both.

If the Database is already running and if you lost these files, you won’t face any problems until you come across a situation to bounce the Database 😉

If the Database is down and your task is to bring it up, you’re already dead 🙂

Don’t be panic in this kind of situations, as I do 🙂

I’m going to tell you a trick by following which you can make your instance alive by having at least some amount of settings of your running instance recovered.

You can find the details about the various parameters with which the instance is started in the alert log. Yes, the alert log collects all the parameters before the instance get started. So you can use them to construct a pfile and start your instance with that pfile like below

start pfile='<path to PFILE>’

after that you can generate the SPFILE as well using

create spfile from pfile

One caveat to this is, you couldn’t recover the setting you made ‘on the fly’. i.e. the settings you made to the spfile directly while the instance is running.

If you’re smart and already configured SPFILE auto backup, you don’t need all the things mentioned above. You could just restore the SPFILE from the auto backup.

See the instructions below to do that (courtesy Oracle Documentation )

To restore the server parameter file:

  1. If the database is up at the time of the loss of the SPFILE, connect to the target database. For example, run:
    % rman TARGET /

    If the database is not up when the SPFILE is lost, and you are not using a recovery catalog, then you must set the DBID of the target database. See“Determining your DBID” for details on determining your DBID.

  2. Shut down the instance and restart it without mounting. When the SPFILE is not available, RMAN starts the instance with a dummy parameter file. For example:
    RMAN> STARTUP FORCE NOMOUNT;
  3. Restore the server parameter file. If restoring to the default location, then run:
      RMAN> RESTORE SPFILE FROM AUTOBACKUP;

If restoring to a nondefault location, then you could run commands as in the following example:

      RMAN> RESTORE SPFILE TO '/tmp/spfileTEMP.ora' FROM AUTOBACKUP;

Hope you enjoyed it.

I always welcome your feedback and suggestions.

Thanks,

CSM

Can we create indexes on views?

Hi friends,

Recently one of the members of OTN forums has asked this question: Can we create indexes on views?

I told him absolutely not, because a view is nothing but a stored SQL query. It’s the window through which you can see the data of your choice. As the ultimate data container is the table, you impose all the required conditions(constraints, indexes etc…) on the table itself; they’ll be reflected in the view.

So it’s true that we can’t impose any constraints on views. OK, then what’s the case with the MVs (Materialized views)?? Can’t we create views on them too?

If you say we can’t, it’s wrong. We can create constraints, indexes on MVs as MVs contains the actual data i.e. the result of the query at the instant the query is executed.

create materialized view abc as select col1, col2 from t1;

ALTER MATERIALIZED VIEW abc add constraint 
abc_constraint check ( col2 < 10 ) deferrable;

So when you design applications, keep in mind that we can apply specific constraints, create indexes to attain the performance gains, on the selected set of data (of course through MVs).

There’s a caveat for this, which I’ve noticed while doing some time-pass testing 🙂

See that exercise here

SQL> create table t1 as select rownum rnm,'row_number:'||rownum rnmstr 
     from user_objects where rownum<=5;

Table created.  

SQL> alter table t1 add primary key(rnm);

Table altered.

SQL> create unique index t1_uk on t1(rnmstr);

Index created.

SQL> create materialized view mvt1 refresh fast with primary key 
     as select * from t1;

Materialized view created.

SQL> create unique index mvt1_uk on mvt1(rnmstr);

Index created.

SQL> select * from t1;

         R RNMSTR
---------- -------------------------------------------
         1 row_number:1
         2 row_number:2
         3 row_number:3
         4 row_number:4
         5 row_number:5

Now,we’ll interchange the positions of two of the rows

SQL> update t1 set RNMSTR='TEMP' where rnm=1;
1 row updated. 

SQL> update t1 set RNMSTR='row_number:1' where rnm=2; 

1 row updated. 

SQL> update t1 set RNMSTR='row_number:2' where rnm=1;

 1 row updated. 

SQL> commit; 

Commit complete. 

SQL> select * from t1;

         R RNMSTR
---------- -------------------------------------------
         1 row_number:2
         2 row_number:1
         3 row_number:3
         4 row_number:4
         5 row_number:5

Now when we try to refresh the mview we get the error.

SQL> exec dbms_mview.refresh('MVT1');
BEGIN dbms_mview.refresh('MVT1'); END;

*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00001: unique constraint (YAS.MVT1_UK) violated
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 820
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 877
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 858
ORA-06512: at line 1

If we drop the unique index and recreate it as nonunique we can refresh the mview without errors.

SQL> drop index MVT1_UK;

Index dropped.

SQL> create index MVT1_AK on MVT1(rnmstr);

Index created.

SQL> exec dbms_mview.refresh('MVT1');

PL/SQL procedure successfully completed.

SQL> select * from MVT1;
         R RNMSTR
---------- -------------------------------------------
         1 row_number:2
         2 row_number:1
         3 row_number:3
         4 row_number:4
         5 row_number:5

The reason of this error is explained in Metalink note 284101.1. According to that note the refresh mechanism may temporarily leave the data in an inconsistent state during the refresh. The modifications are not made with the same order as the modifications to the base table. Because of this creating unique indexes, foreign keys or primary keys on mviews can cause problems in the refresh process. An mview is just a selective copy of the base table, so if we define the integrity constraints on the base table the mview will definitely comply to you expected constraints as the data in base table comply them. But you can create indexes without any fear if you are seeing a scope for performance gains.

Hope you liked it. I’m always welcome to suggestions/ comments

Regards,

CSM