How reliable it is using DBMS_STATS.AUTO_SAMPLE_SIZE for gathering table statistics??

Statistics collection in Oracle is managed through a pl/sql package “dbms_stats”. It provides several procedures to gather statistics for a table, schema, or a database. This procedure has an estimate_percent parameter, which specifies the sampling percentage or in other words the sample of object’s data Oracle have to scan to come to an estimation on different statistics. The users can specify any number between 0 and 100 for this parameter.
For example, suppose you have a table ABCD, and you want to specify a 10% sampling percentage, you can do this:

exec dbms_stats.gather_table_stats(null, ‘ABCD’, estimate_percent => 10);

For more details on this procedure: http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_stats.htm

It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics (I’ve seen hours on some really big tables). On the contrary, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate which is even a bigger problem.

For this reason, Oracle introduced the AUTO value for the estimate_percent parameter.

For example, you can gather statistics on ABCD as follows:

exec dbms_stats.gather_table_stats(null, ‘ABCD’,

estimate_percent => dbms_stats.auto_sample_size);

There are 2  primary advantage of using AUTO sample size over a fixed number.

  1. When AUTO sample size is specified, the system automatically determines the appropriate sampling percentage.
  2. AUTO sample size is more flexible than a fixed sampling percentage. A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed. On the other hand when AUTO value is used Oracle will adjust the sample size when the data distribution changes.

When AUTO is used Oracle picks a sample size where the statistics quality is good enough. However, it does not work very well under cases of extreme skew in the data. To illustrate the accuracy of auto sample size , I compare the performance of using a fixed sampling percentage, AUTO sample size in a Oracle 11g database.

There’s a table called CUSTOMERSCSM which contains 23 columns and 6,216,000 rows. This is a table I’ve taken from the Oracle’s sample tables in SH schema (tables that comes with installation).

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => 1);

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.25

and

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => 100);

PL/SQL procedure successfully completed.

Elapsed: 00:02:52.37

and

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => dbms_stats.auto_sample_size);

PL/SQL procedure successfully completed.

Elapsed: 00:00:29.47

 

You can see there’s a huge difference between 1 and 100 percent samples whereas 1 and auto sample size are finished almost in the same time (A 18 seconds delay, to be accurate).

As timing wise we have significant gains with auto sample size, now we’ll see the accuracy part. Among all the statistics of a column, number of distinct values (NDV) is the one whose accuracy used to be an issue.

the actual distinct values in the column CUST_MAIN_PHONE_NUMBER of CUSTOMERCSM table is

SQL> select count(distinct(CUST_MAIN_PHONE_NUMBER)) from CUSTOMERSCSM;

COUNT(DISTINCT(CUST_MAIN_PHONE_NUMBER))

—————————————

                                 51000

 

Case1: With 1% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                              NUM_DISTINCT

——————————                                    ————

CUST_MAIN_PHONE_NUMBER                  49362

Case2: With100% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                               NUM_DISTINCT

——————————                                      ————

CUST_MAIN_PHONE_NUMBER                    51000

Case3: With AUTO sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                           NUM_DISTINCT

——————————                                  ————

CUST_MAIN_PHONE_NUMBER                51344

 

Estimate Percentage Actual value Value from stats % deviation
1% 51000 49362 3.21%
100% 51000 51000 0%
AUTO 51000 51344 0.67%

So percentage deviation is very less with auto sample compared to 1%

Now we’ll do the same exercise on another column called CUST_ID.

SQL> select count(distinct(CUST_ID)) from CUSTOMERSCSM;

COUNT(DISTINCT(CUST_ID))

————————

55500

Case1: With 1% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55704

Case2: With100% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55500

 

Case3: With AUTO sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55992

Estimate Percentage Actual value Value from stats % deviation
1% 55500 55704 0.37%
100% 55500 55500 0%
AUTO 55500 55992 0.89%

Here if you see, the AUTO sample size is much deviated from the 1% sample.

So, just now we’ve seen two different scenarios with 1% and auto sample sizes. So it’s reliable to depend on the auto sampling??

I’m still trying to find the answers and will post my observations soon through another article. I’ll be happy if anyone could clear the clouds even before I do it for myself Smile

Have a good day guys. Keep Contemplating.

 

regards,

CSM

DBA_DATA_FILES, DBA_FREE_SPACE and DBA_SEGMENTS–A Myth Buster

Hi Friends,

I’ve been encountering with lot of questions related to free space and used space details in an Oracle database in various forums and discussions.

Is that really difficult to understands the Oracle space management system? No it’s not; I’m listing down my observations here, which could help discern through the different data dictionary view related to space management and bust the myths around them

The 3 main data dictionary views which are frequently being used by many people are DBA_DATA_FILES, DBA_SEGMENTS and DBA_FREE_SPACE.

Lot of queries available on the internet, which are intended to know the space utilization details are based on DBA_DATA_FILES & DBA_FREE_SPACE and some on DBA_DATA_FILES & DBA_SEGMENTS.

Some free space reporting queries uses DBA_DATA_FILES-DBA_SEGMENTS as free space and some directly uses DBA_FREE_SPACE as free space; Which one to use?

See the below results

SQL> select sum(bytes)/1024/1024 from dba_data_files where tablespace_name=’USERS_SMALL’;

SUM(BYTES)/1024/1024
——————–
266016

SQL> select sum(bytes)/1024/1024 from dba_segments where tablespace_name=’USERS_SMALL’;

SUM(BYTES)/1024/1024
——————–
241157

SQL> select sum(bytes)/1024/1024 from dba_free_space where tablespace_name=’USERS_SMALL’;

SUM(BYTES)/1024/1024
——————–
24856

So DBA_DATA_FILES-DBA_SEGMENTS=266016-241157=24859, So there’s 3MB difference between both the results and in some cases it’s even more. So which one to believe? What’s the actual free space I can use for my objects?

The answer is DBA_FREE_SPACE which actually contains the free space that available to all the objects we’re going to create.

Then why I’m saying the DBA_DATA_FILES-DBA_SEGMENTS is incorrect? According to theory it’s perfectly correct but in reality It’s the LMT Headers/space management blocks + and the last “few” blocks at the end of the file which aren’t enough to make up a whole extent according to the extent allocation and segment space definitions of the tablespace. Thanks to Jonathan Lewis Sir for pointing that out. Please go through Jonathan’s blog http://jonathanlewis.wordpress.com/2012/05/31/lmt-headers/ to get a clear idea of what these LMT headers are.

So, DBA_DATA_FILES=DBA_SEGMENTS+DBA_FREE_SPACE+X

where X=first few blocks of the FILE that are taken up by space management blocks + the last “few” blocks at the end of the file which aren’t enough to make up a whole extent according to the extent allocation and segment space definitions of the tablespace

How could I get the ‘actual’ free space details without using the DBA_FREE_SPACE view? It’s the only way or any alternative exist?

An alternative is there through which you can get the actual free space information just by using DBA_DATA_FILES and DBA_SEGMENTS/DBA_EXTENTS.

See the query below

SQL> select sum(user_bytes)/1024/1024 from dba_data_files where tablespace_name=’USERS_SMALL’;

SUM(USER_BYTES)/1024/1024
————————-
266013

There’s a column called USER_BYTES in DBA_DATA_FILES which contains the actual total space excluding the LMT headers+leftover blocks as mentioned earlier.

So DBA_DATA_FILES[USER_BYTES]-DBA_SEGMENTS=266013-241157=24856 which is exactly the same value we’re seeing in DBA_FREE_SPACE.

Don’t be calm and quite if some one argues with the results of DBA_DATA_FILES[BYTES]-DBA_SEGMENTS as the actual free space available in the Database. Just ask him to read this or even better solution is to ask him to fill with the database with data equivalent to what he’s claiming as free space Winking smile

CSM

Tracing DATAPUMP jobs

Hi All,

Now a days datapump has become like the ‘select * from…’ to the DBAs as we come across many situations to move the data here and there and among heterogeneous environments. So, having good knowledge on datapump utility and it’s features is quite commendable.

At times we all come across some weird problems using datapump. I’m one of the millions who thought ‘God!! how to trace my impdp/expdp?’ by doing which I can get the information about internal processing and debug the issues quickly.

God answered to my prayers and I came to know about one undocumented parameter called ‘TRACE’ which would help us in generating detailed, system level, traces for the datapump operations.

Let’s see how it works

This parameter,TRACE, takes value as 7 digit hexadecimal numbers. Specifying the parameter value follow some rules.
Out of 7 digit hexadecimal number,

– The first 3 digits are responsible to enable tracing for a specific data pump component and  last 4 digits are usually 0300

– Specifying more than 7 hexadecimal number is not allowed. Doing so will result in UDE-00014: invalid value for parameter, ‘trace’.

– Leading zero can be omitted. So the values may be less than 7 characters/digits and the values are not case sensitive.

Before starting the tracing make sure you have the apposite value set to the MAX_DUMP_FILE_SIZE initialization parameter because this parameter decides how big a trace file can grow. The default value is UNLIMITED and it’s fine to have that.

SQL> show parameter max_dump_file
NAME TYPE VALUE
———————————— ———– ——————-
max_dump_file_size string UNLIMITED

If it’s not unlimited, try setting it through the following command

alter system set max_dump_file_size=UNLIMITED;

The majority of errors that occur during a Data Pump job, can be diagnosed by creating a trace file for the Master Control Process (MCP) and the Worker Process(es) only.

In case of standard tracing trace files are generated in BACKGROUND_DUMP_DEST. In case of standard tracing,

– Master Process trace file format:  <SID>_dm<number>_<process_id>.trc

– Worker Process trace file format: <SID>_dw<number>_<process_id>.trc

In case of full tracing two trace files are generated in BACKGROUND_DUMP_DEST which are the one’s generated by standard tracing modiles and an additional trace file is generated in USER_DUMP_DEST.

Shadow Process trace file format: <SID>_ora_<process_id>.trc

In Databases where ADR is configured, the locations of the files will be like

$ADR_HOME/trace:
– Data Pump Master Control Processes files have format: SID_dm_process_id.trc
– Data Pump Worker Process trace files have format: SID_dw_process_id.trc
– Data Pump Shadow Process trace files have format: SID_ora_process_id.trc

The list of trace level in data pump is shown below.

 Trace   DM   DW  ORA  Lines
  level  trc  trc  trc     in
  (hex) file file file  trace                                         Purpose
------- ---- ---- ---- ------ -----------------------------------------------
  10300    x    x    x  SHDW: To trace the Shadow process (API) (expdp/impdp)
  20300    x    x    x  KUPV: To trace Fixed table
  40300    x    x    x  'div' To trace Process services
  80300    x            KUPM: To trace Master Control Process (MCP)      (DM)
 100300    x    x       KUPF: To trace File Manager
 200300    x    x    x  KUPC: To trace Queue services
 400300         x       KUPW: To trace Worker process(es)                (DW)
 800300         x       KUPD: To trace Data Package
1000300         x       META: To trace Metadata Package
--- +
1FF0300    x    x    x  'all' To trace all components          (full tracing)

You can use the individual values which are listed in the above table or the combination of them. If you sum all the individual values you will get 1FF0300 which is full tracing.

datapump command for full tracing while doing export is,
expdp DUMPFILE=expdp.dmp LOGFILE=expdp.log TRACE=1FF0300

datapump command for full tracing while doing import is,
impdp DUMPFILE=expdp.dmp LOGFILE=impdp.log TRACE=1FF0300

In most of the scenarios full tracing is not required as you’ll be bombarded with wealth of information which takes significant effort/time to analyze. As trace 400300 is to trace the Worker process(es) and trace 80300 is to trace Master Control Process (MCP), combining them we can use trace 480300 to trace both the Master Control process (MCP) and the Worker process(es). We need to leave last 3 digits (300) while adding them so you got 480300 instead of 480600.

Hope this helps. In next series of articles we’ll see how to enable tracing on an already running datapump process (I’m not kidding) and how to analyze the trace file generated through the above method.

ORA-39142 “incompatible version”

Hi,

Many people might have seen this error “ORA-39142 incompatible version” while importing the datapump dump file generated from a higher version Database into a lower version Database.

Many people has suggested me to upgrade the target also to the source version and some said it’s not at all possible to import a higher version dump into a lower version Database.

Thanks to the datapump team @ Oracle. they’ve given an excellent option for the expdp command. that’s called ‘version‘.

below is the computability matrix for EXPDP.

    Data Pump dumpfile compatibility.

    Export              Use Export Data Pump parameter VERSION=...
      From                 if dumpfile needs to be imported into a
    Source                Target Database with compatibility level
  Database        (value of init.ora/spfile parameter COMPATIBLE):
      With
COMPATIBLE     9.2.0.x.0    10.1.0.x.0    10.2.0.x.0    11.1.0.x.0    11.2.0.x.0
---------- ------------- ------------- ------------- ------------- -------------
10.1.0.x.0   VERSION=9.2             -             -             -             -
---------- ------------- ------------- ------------- ------------- -------------
10.2.0.x.0   VERSION=9.2  VERSION=10.1             -             -             -
---------- ------------- ------------- ------------- ------------- -------------
11.1.0.x.0   VERSION=9.2  VERSION=10.1  VERSION=10.2             -             -
---------- ------------- ------------- ------------- ------------- -------------
11.2.0.x.0   VERSION=9.2  VERSION=10.1  VERSION=10.2  VERSION=11.1             -
---------- ------------- ------------- ------------- ------------- -------------

So what is not possible then?

Connection to the Database with higher RDBMS version through a client (IMPDP binary) of lower version. See below for the compatibility matrix of that

Data Pump client compatibility.
===============================

   expdp and  Connecting to Database version
impdp client     10gR1      10gR2      11gR1      11gR2
     version  10.1.0.x   10.2.0.x   11.1.0.x   11.2.0.x
----------- ---------- ---------- ---------- ----------
   10.1.0.x  supported  supported  supported  supported
   10.2.0.x         no  supported  supported  supported
   11.1.0.x         no         no  supported  supported
   11.2.0.x         no         no         no  supported

 

Hope this helps.

Regards,

CSM

 

What if lost both the PFILE and SPFILE???

It’s been two months I didn’t write any articles. It’s been a continuous festive season, here in India, and a job change for me 🙂

Hope I’ll continue my interest with the same pace as earlier.

Loosing SPFILE and the PFILE might be a typical scenario faced by lot of DBAs (mostly the junior DBAs): Loosing PFILE and SPFILE both.

If the Database is already running and if you lost these files, you won’t face any problems until you come across a situation to bounce the Database 😉

If the Database is down and your task is to bring it up, you’re already dead 🙂

Don’t be panic in this kind of situations, as I do 🙂

I’m going to tell you a trick by following which you can make your instance alive by having at least some amount of settings of your running instance recovered.

You can find the details about the various parameters with which the instance is started in the alert log. Yes, the alert log collects all the parameters before the instance get started. So you can use them to construct a pfile and start your instance with that pfile like below

start pfile='<path to PFILE>’

after that you can generate the SPFILE as well using

create spfile from pfile

One caveat to this is, you couldn’t recover the setting you made ‘on the fly’. i.e. the settings you made to the spfile directly while the instance is running.

If you’re smart and already configured SPFILE auto backup, you don’t need all the things mentioned above. You could just restore the SPFILE from the auto backup.

See the instructions below to do that (courtesy Oracle Documentation )

To restore the server parameter file:

  1. If the database is up at the time of the loss of the SPFILE, connect to the target database. For example, run:
    % rman TARGET /

    If the database is not up when the SPFILE is lost, and you are not using a recovery catalog, then you must set the DBID of the target database. See“Determining your DBID” for details on determining your DBID.

  2. Shut down the instance and restart it without mounting. When the SPFILE is not available, RMAN starts the instance with a dummy parameter file. For example:
    RMAN> STARTUP FORCE NOMOUNT;
  3. Restore the server parameter file. If restoring to the default location, then run:
      RMAN> RESTORE SPFILE FROM AUTOBACKUP;

If restoring to a nondefault location, then you could run commands as in the following example:

      RMAN> RESTORE SPFILE TO '/tmp/spfileTEMP.ora' FROM AUTOBACKUP;

Hope you enjoyed it.

I always welcome your feedback and suggestions.

Thanks,

CSM

Can we create indexes on views?

Hi friends,

Recently one of the members of OTN forums has asked this question: Can we create indexes on views?

I told him absolutely not, because a view is nothing but a stored SQL query. It’s the window through which you can see the data of your choice. As the ultimate data container is the table, you impose all the required conditions(constraints, indexes etc…) on the table itself; they’ll be reflected in the view.

So it’s true that we can’t impose any constraints on views. OK, then what’s the case with the MVs (Materialized views)?? Can’t we create views on them too?

If you say we can’t, it’s wrong. We can create constraints, indexes on MVs as MVs contains the actual data i.e. the result of the query at the instant the query is executed.

create materialized view abc as select col1, col2 from t1;

ALTER MATERIALIZED VIEW abc add constraint 
abc_constraint check ( col2 < 10 ) deferrable;

So when you design applications, keep in mind that we can apply specific constraints, create indexes to attain the performance gains, on the selected set of data (of course through MVs).

There’s a caveat for this, which I’ve noticed while doing some time-pass testing 🙂

See that exercise here

SQL> create table t1 as select rownum rnm,'row_number:'||rownum rnmstr 
     from user_objects where rownum<=5;

Table created.  

SQL> alter table t1 add primary key(rnm);

Table altered.

SQL> create unique index t1_uk on t1(rnmstr);

Index created.

SQL> create materialized view mvt1 refresh fast with primary key 
     as select * from t1;

Materialized view created.

SQL> create unique index mvt1_uk on mvt1(rnmstr);

Index created.

SQL> select * from t1;

         R RNMSTR
---------- -------------------------------------------
         1 row_number:1
         2 row_number:2
         3 row_number:3
         4 row_number:4
         5 row_number:5

Now,we’ll interchange the positions of two of the rows

SQL> update t1 set RNMSTR='TEMP' where rnm=1;
1 row updated. 

SQL> update t1 set RNMSTR='row_number:1' where rnm=2; 

1 row updated. 

SQL> update t1 set RNMSTR='row_number:2' where rnm=1;

 1 row updated. 

SQL> commit; 

Commit complete. 

SQL> select * from t1;

         R RNMSTR
---------- -------------------------------------------
         1 row_number:2
         2 row_number:1
         3 row_number:3
         4 row_number:4
         5 row_number:5

Now when we try to refresh the mview we get the error.

SQL> exec dbms_mview.refresh('MVT1');
BEGIN dbms_mview.refresh('MVT1'); END;

*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00001: unique constraint (YAS.MVT1_UK) violated
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 820
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 877
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 858
ORA-06512: at line 1

If we drop the unique index and recreate it as nonunique we can refresh the mview without errors.

SQL> drop index MVT1_UK;

Index dropped.

SQL> create index MVT1_AK on MVT1(rnmstr);

Index created.

SQL> exec dbms_mview.refresh('MVT1');

PL/SQL procedure successfully completed.

SQL> select * from MVT1;
         R RNMSTR
---------- -------------------------------------------
         1 row_number:2
         2 row_number:1
         3 row_number:3
         4 row_number:4
         5 row_number:5

The reason of this error is explained in Metalink note 284101.1. According to that note the refresh mechanism may temporarily leave the data in an inconsistent state during the refresh. The modifications are not made with the same order as the modifications to the base table. Because of this creating unique indexes, foreign keys or primary keys on mviews can cause problems in the refresh process. An mview is just a selective copy of the base table, so if we define the integrity constraints on the base table the mview will definitely comply to you expected constraints as the data in base table comply them. But you can create indexes without any fear if you are seeing a scope for performance gains.

Hope you liked it. I’m always welcome to suggestions/ comments

Regards,

CSM

How to change the default tablespace of your Oracle Database?

Hi Guys,

This week I’ve come across a requirement to change the default tablespace of one my users. I can do it using the “ALTER USER” command (http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_4003.htm).

Then the requirement has turned out to do it for all the users in the Database.

I thought of doing it by altering every user individually but noticed that my Database contains more than hundred users… Then I thought of writing a small PL/SQL anonymous block and run that.

Isn’t there any other, simple , solution for this?

it wouldn’t be if my Database is running on a version lower than 10g. But in 10g Oracle introduced a new feature which could help in this kind of scenarios.

There’s parameter introduced called ‘DEFAULT_PERMANENT_TABLESPACE’ has been introduced which could change the default tablespace for all users in the Database with a single command.

Please see the demo below,

SQL> SELECT PROPERTY_VALUE FROM DATABASE_PROPERTIES WHERE PROPERTY_NAME = ‘DEFAULT_PERMANENT_TABLESPACE‘;

PROPERTY_VALUE
——————————————————————————–
USERS

Now we’ll create a table without specifying any tablespace clause.

SQL> create table t1(col1 number(10));

Table created.

Now observe the tablespace in which the table, t1, is created.

SQL> select tablespace_name from user_segments where segment_name=’T1′;

TABLESPACE_NAME
——————————
USERS

Now we’ll change the default tablespace of the Database to CSMTEST

SQL> alter database default tablespace csmtest;

Database altered.

So, csmtest is our new default permanent tablespace. Try creating a new table again the same we did earlier

SQL> create table t2(col1 number(10));

Table created.

Now see what tablespace the object, T2, has gone into.

SQL> select tablespace_name from user_segments where segment_name=’T2‘;

TABLESPACE_NAME
——————————
CSMTEST

This is the way we can change the default tablespaces of an Oracle Database. The good thing with this is we can do this online without bouncing the instance/ doing any extra things…just issue a one line command, Oracle will take care of everything else…

And a bonus for you is, we can use the same strategy for changing the default temporary tablespaces as well.

The parameter for that iiiiis… yes your guess is correct 🙂 it’s DEFAULT_TEMP_TABLESPACE and the statement to change the default temporary tablespace is

alter database default temporary tablespace <temp_tablespace_of_your_choice>;

Hope you like it and this trick could be helpful to you guys.

I’m always happy to receive your suggestions/ feedback. Have a great day ahead 🙂

Regards,

CSM