How reliable it is using DBMS_STATS.AUTO_SAMPLE_SIZE for gathering table statistics??

Statistics collection in Oracle is managed through a pl/sql package “dbms_stats”. It provides several procedures to gather statistics for a table, schema, or a database. This procedure has an estimate_percent parameter, which specifies the sampling percentage or in other words the sample of object’s data Oracle have to scan to come to an estimation on different statistics. The users can specify any number between 0 and 100 for this parameter.
For example, suppose you have a table ABCD, and you want to specify a 10% sampling percentage, you can do this:

exec dbms_stats.gather_table_stats(null, ‘ABCD’, estimate_percent => 10);

For more details on this procedure: http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_stats.htm

It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics (I’ve seen hours on some really big tables). On the contrary, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate which is even a bigger problem.

For this reason, Oracle introduced the AUTO value for the estimate_percent parameter.

For example, you can gather statistics on ABCD as follows:

exec dbms_stats.gather_table_stats(null, ‘ABCD’,

estimate_percent => dbms_stats.auto_sample_size);

There are 2  primary advantage of using AUTO sample size over a fixed number.

  1. When AUTO sample size is specified, the system automatically determines the appropriate sampling percentage.
  2. AUTO sample size is more flexible than a fixed sampling percentage. A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed. On the other hand when AUTO value is used Oracle will adjust the sample size when the data distribution changes.

When AUTO is used Oracle picks a sample size where the statistics quality is good enough. However, it does not work very well under cases of extreme skew in the data. To illustrate the accuracy of auto sample size , I compare the performance of using a fixed sampling percentage, AUTO sample size in a Oracle 11g database.

There’s a table called CUSTOMERSCSM which contains 23 columns and 6,216,000 rows. This is a table I’ve taken from the Oracle’s sample tables in SH schema (tables that comes with installation).

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => 1);

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.25

and

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => 100);

PL/SQL procedure successfully completed.

Elapsed: 00:02:52.37

and

SQL> exec dbms_stats.gather_table_stats(null, ‘CUSTOMERSCSM’,estimate_percent => dbms_stats.auto_sample_size);

PL/SQL procedure successfully completed.

Elapsed: 00:00:29.47

 

You can see there’s a huge difference between 1 and 100 percent samples whereas 1 and auto sample size are finished almost in the same time (A 18 seconds delay, to be accurate).

As timing wise we have significant gains with auto sample size, now we’ll see the accuracy part. Among all the statistics of a column, number of distinct values (NDV) is the one whose accuracy used to be an issue.

the actual distinct values in the column CUST_MAIN_PHONE_NUMBER of CUSTOMERCSM table is

SQL> select count(distinct(CUST_MAIN_PHONE_NUMBER)) from CUSTOMERSCSM;

COUNT(DISTINCT(CUST_MAIN_PHONE_NUMBER))

—————————————

                                 51000

 

Case1: With 1% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                              NUM_DISTINCT

——————————                                    ————

CUST_MAIN_PHONE_NUMBER                  49362

Case2: With100% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                               NUM_DISTINCT

——————————                                      ————

CUST_MAIN_PHONE_NUMBER                    51000

Case3: With AUTO sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_MAIN_PHONE_NUMBER’;

COLUMN_NAME                                           NUM_DISTINCT

——————————                                  ————

CUST_MAIN_PHONE_NUMBER                51344

 

Estimate Percentage Actual value Value from stats % deviation
1% 51000 49362 3.21%
100% 51000 51000 0%
AUTO 51000 51344 0.67%

So percentage deviation is very less with auto sample compared to 1%

Now we’ll do the same exercise on another column called CUST_ID.

SQL> select count(distinct(CUST_ID)) from CUSTOMERSCSM;

COUNT(DISTINCT(CUST_ID))

————————

55500

Case1: With 1% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55704

Case2: With100% sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55500

 

Case3: With AUTO sample size

SQL> select column_name,NUM_DISTINCT from DBA_TAB_COL_STATISTICS where TABLE_NAME=’CUSTOMERSCSM’ and COLUMN_NAME=’CUST_ID’;

COLUMN_NAME                    NUM_DISTINCT

—————————— ————

CUST_ID                               55992

Estimate Percentage Actual value Value from stats % deviation
1% 55500 55704 0.37%
100% 55500 55500 0%
AUTO 55500 55992 0.89%

Here if you see, the AUTO sample size is much deviated from the 1% sample.

So, just now we’ve seen two different scenarios with 1% and auto sample sizes. So it’s reliable to depend on the auto sampling??

I’m still trying to find the answers and will post my observations soon through another article. I’ll be happy if anyone could clear the clouds even before I do it for myself Smile

Have a good day guys. Keep Contemplating.

 

regards,

CSM

DBA_DATA_FILES, DBA_FREE_SPACE and DBA_SEGMENTS–A Myth Buster

Hi Friends,

I’ve been encountering with lot of questions related to free space and used space details in an Oracle database in various forums and discussions.

Is that really difficult to understands the Oracle space management system? No it’s not; I’m listing down my observations here, which could help discern through the different data dictionary view related to space management and bust the myths around them

The 3 main data dictionary views which are frequently being used by many people are DBA_DATA_FILES, DBA_SEGMENTS and DBA_FREE_SPACE.

Lot of queries available on the internet, which are intended to know the space utilization details are based on DBA_DATA_FILES & DBA_FREE_SPACE and some on DBA_DATA_FILES & DBA_SEGMENTS.

Some free space reporting queries uses DBA_DATA_FILES-DBA_SEGMENTS as free space and some directly uses DBA_FREE_SPACE as free space; Which one to use?

See the below results

SQL> select sum(bytes)/1024/1024 from dba_data_files where tablespace_name=’USERS_SMALL’;

SUM(BYTES)/1024/1024
——————–
266016

SQL> select sum(bytes)/1024/1024 from dba_segments where tablespace_name=’USERS_SMALL’;

SUM(BYTES)/1024/1024
——————–
241157

SQL> select sum(bytes)/1024/1024 from dba_free_space where tablespace_name=’USERS_SMALL’;

SUM(BYTES)/1024/1024
——————–
24856

So DBA_DATA_FILES-DBA_SEGMENTS=266016-241157=24859, So there’s 3MB difference between both the results and in some cases it’s even more. So which one to believe? What’s the actual free space I can use for my objects?

The answer is DBA_FREE_SPACE which actually contains the free space that available to all the objects we’re going to create.

Then why I’m saying the DBA_DATA_FILES-DBA_SEGMENTS is incorrect? According to theory it’s perfectly correct but in reality It’s the LMT Headers/space management blocks + and the last “few” blocks at the end of the file which aren’t enough to make up a whole extent according to the extent allocation and segment space definitions of the tablespace. Thanks to Jonathan Lewis Sir for pointing that out. Please go through Jonathan’s blog http://jonathanlewis.wordpress.com/2012/05/31/lmt-headers/ to get a clear idea of what these LMT headers are.

So, DBA_DATA_FILES=DBA_SEGMENTS+DBA_FREE_SPACE+X

where X=first few blocks of the FILE that are taken up by space management blocks + the last “few” blocks at the end of the file which aren’t enough to make up a whole extent according to the extent allocation and segment space definitions of the tablespace

How could I get the ‘actual’ free space details without using the DBA_FREE_SPACE view? It’s the only way or any alternative exist?

An alternative is there through which you can get the actual free space information just by using DBA_DATA_FILES and DBA_SEGMENTS/DBA_EXTENTS.

See the query below

SQL> select sum(user_bytes)/1024/1024 from dba_data_files where tablespace_name=’USERS_SMALL’;

SUM(USER_BYTES)/1024/1024
————————-
266013

There’s a column called USER_BYTES in DBA_DATA_FILES which contains the actual total space excluding the LMT headers+leftover blocks as mentioned earlier.

So DBA_DATA_FILES[USER_BYTES]-DBA_SEGMENTS=266013-241157=24856 which is exactly the same value we’re seeing in DBA_FREE_SPACE.

Don’t be calm and quite if some one argues with the results of DBA_DATA_FILES[BYTES]-DBA_SEGMENTS as the actual free space available in the Database. Just ask him to read this or even better solution is to ask him to fill with the database with data equivalent to what he’s claiming as free space Winking smile

CSM

Enabling tracing on a running DATAPUMP job

Hi guys,

In the last article I’ve mentioned about how to enable tracing on a datapump (export/import) job.

What if I already triggered my datapump job and it’s in the middle of doing it’s task? Shall I kill the entire job intermittently and restart it by enabling the tracing again? That’s really a bad idea  if we’re dealing with GBs/TBs of data.

Thanks to the datapump team at Oracle; They made it much more simpler to the users.

Please follow the below procedure to enable tracing on an existing datapump job.

% expdp system/manager DIRECTORY=my_dir DUMPFILE=expdp_f.dmp \
LOGFILE=expdp_f.log FULL=Y
Export: Release 10.2.0.3.0 - Production on Thursday, 18 October, 2007 17:11:08
Copyright (c) 2003, 2005, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options
FLASHBACK automatically enabled to preserve database integrity.
Starting "SYSTEM"."SYS_EXPORT_FULL_01":  system/******** DIRECTORY=my_dir 
DUMPFILE=expdp_f.dmp LOGFILE=expdp_f.log FULL=y
Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
--

press Control-C to go to the Interactive Command mode, and temporarily stop the job with the STOP_JOB command:

Export> stop
Are you sure you wish to stop this job ([yes]/no): yes
-- use the system generated job-name SYS_EXPORT_FULL_01 to re-attach to the job
-- and specify the TRACE parameter with a tracing level:


% expdp system/manager ATTACH=sys_export_full_01 TRACE=480300
Export: Release 10.2.0.3.0 - Production on Thursday, 18 October, 2007 17:23:48
Copyright (c) 2003, 2005, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options
Job: SYS_EXPORT_FULL_01
  Owner: SYSTEM
  Operation: EXPORT
  Creator Privs: FALSE
  GUID: F4E6BF997DFA46378D543F998E82653E
  Start Time: Thursday, 18 October, 2007 17:23:49 
  Mode: FULL
  Instance: m10203wa
  Max Parallelism: 1
  EXPORT Job Parameters:
  Parameter Name      Parameter Value:
     CLIENT_COMMAND        system/******** DIRECTORY=my_dir DUMPFILE=expdp_f.dmp 
LOGFILE=expdp_f.log FULL=y
  State: IDLING
  Bytes Processed: 0
  Current Parallelism: 1
  Job Error Count: 0
  Dump File: /usr/DataPump/expdp_f.dmp
    bytes written: 4,096
Worker 1 Status:
  State: UNDEFINED
-- restart the job and change back from Interactive Command mode to Logging mode
-- with CONTINUE_CLIENT (note that tracing with level 480300 is now active):
Export> cont
Restarting "SYSTEM"."SYS_EXPORT_FULL_01":  system/******** DIRECTORY=my_dir 
DUMPFILE=expdp_f.dmp LOGFILE=expdp_s.log FULL=y
Processing object type DATABASE_EXPORT/TABLESPACE
Processing object type DATABASE_EXPORT/PROFILE
...


Use a database init.ora/spfile event to trace Data Pump (Not in a case of a job  that’s already running).

Data Pump tracing can also be started with a line with EVENT 39089 in the initialization parameter file. This method should only be used to trace the Data Pump calls in an early state, e.g. if details are needed about the DBMS_DATAPUMP.OPEN API call. Trace level 0x300 will trace all Data Pump client processes. Note that this event cannot be enabled through an ALTER SYSTEM command, only a restart of the database will enable the event.
Example when using init.ora initialization parameter file:

- add the following line to init.ora parameter file: 
EVENT="39089 trace name context forever,level 0x300" 
- Restart the database. 
- Start the Export Data Pump or Import Data Pump job.

Example when using spfile initialization parameter file:

-- when using spfile parameter file:
CONNECT / as sysdba 
SHOW PARAMETER event
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event                                string
ALTER SYSTEM SET EVENTS = '39089 trace name context forever, level 0x300' 
SCOPE = spfile; 
SHUTDOWN immediate
STARTUP
SHOW PARAMETER event
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event                                string      39089 trace name context forev
                                                 er, level 0x300
- Start the Export Data Pump or Import Data Pump job.
--
to remove the event(s) again:
ALTER SYSTEM RESET EVENT SCOPE = spfile SID='*';
  
SHUTDOWN immediate
STARTUP

Source: Oracle Metalink.

Enable BCT (Block Change Tracking)

Hi,

In my previous post (https://querycsm.wordpress.com/2013/01/02/why-rman-daily-incremental-backups-are-taking-so-long/) I’ve talked about investigating the potential problems and stated that one of the improvement initiatives we could go ahead with is ‘Enabling BCT’.

Now we’ll see how to do that and are there any other things we need to perform after enabling.

First I recommend you to go through the document prepared by my favorite author Alexander Gorbachev to get an idea of what’s BCT and how it works(at least go through the first few pages).

To enable BCT we have two ways:

Specifying the BCT file location manually like this

SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE ‘/dbfile/bct_file.ora’;

or the other way of doing is

SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING;

but you need to set db_create_file_dest to the location where we want the BCT file to be created before enabling it.

SQL> alter system set db_create_file_dest=’location’ SCOPE=BOTH;

To check if the BCT is enabled or not fire the below query

SQL> select * from v$block_change_tracking;

STATUS        FILENAME                                                                        BYTES

ENABLED   /ora11dbdata2/oracle/NIKU/blk_chg_trckng.dbf    11599872

I did these steps and was not seeing any improvements in the PCT_READ(percentage of blocks read) from the below query.

SQL> select file#, avg(datafile_blocks) blocks,

2  avg(blocks_read) blocks_read,

3  avg(blocks_read/datafile_blocks)*100 pct_read,

avg(blocks) blocks_backed_up,

4    5  (avg(blocks)/avg(datafile_blocks))*100 pct_backed_up

from v$backup_datafile

6    7  where incremental_level = 1

8  group by file#

9  order by file#

So after doing lot of research, I found that we need to take a FULL backup of the Database, after enabling BCT, to leverage it’s benefits. We should do this because when we do a FULL backup, the database will create a baseline bitmap and from there it’ll start updating the bitmap based upon the blocks that are changed.

I’ve later observed that lot of people are facing similar statistics i.e. no improvement after enabling the BCT and this is the answer for that.

BTW, the performance improvement after enabling BCT in our production Database is humongous and we’re now seeing only 2%-3% of the blocks are getting read by RMAN. thanks to BCT!!!

You’re always welcome to any queries and questions.

Regards,

CSM

Why RMAN daily incremental backups are taking so long???

Hi,

Have you ever come across a weird problem with RMAN of taking too long to take even daily Level-1 incremental backups and backing up huge amount of data?

I’ve seen such behavior with some of our production Databases. even a daily incremental backup is having a size of more than 80% of the original and taking a time more than the full backup takes.

We’ll now breakdown the problem and see what could be the reasons and where to start the investigation.

1. As all of us know incremental backups are the backups of the blocks which are changed from last successful incremental backups. If it’s having huge size then no.of blocks are getting changed daily and Oracle is not having any option but to back that block up.

So, where to start our investigation about which objects are generating ‘huge deltas’ and being changed much??

Use the query below to know the number of blocks being changed file wise…

SQL> select file#, avg(datafile_blocks) blocks,

2  avg(blocks_read) blocks_read,

3  avg(blocks_read/datafile_blocks)*100 pct_read,

avg(blocks) blocks_backed_up,

4    5  (avg(blocks)/avg(datafile_blocks))*100 pct_backed_up

from v$backup_datafile

6    7  where incremental_level = 1

8  group by file#

9  order by file#

From the above query you’ll get the no.of blocks being backed up(pct_backed_up) from each file and get into the details like which objects resides on that file via the tablespace_name etc…

2. What if it’s taking so long to complete the backup operation? Whom we have to suspect?

There are lot of areas of concern like tape drive’s performance, if it’s on disk the disk controller’s performance, the OS level issues as RMAN is finally an OS level process etc…

But from the Database point of view we can use the above query and if we see 100% for all the files in the pct_read, then we could confirm that the Database is reading each and every block and wasting lot of time.

First of all why it’s doing so? It’s because RMAN has to visit each block and see if it got modified or not and backup if it’s modified. that’s the way the algorithm has been devised (a linear search)…

So is there anyway we could make it a bit efficient??

Yes, absolutely yes!!! Oracle has gifted us with a functionality called BCT (Block Change Tracking). Please Google that word to get very good article and information about that.

Please enable BCT and you’ll be amazed by the results as I do!!!

We’ll discuss on how to enable BCT and some hidden facts about BCT etc in my next article…

Keep reading!!!

 

Regards,

CSM

ORA-39142 “incompatible version”

Hi,

Many people might have seen this error “ORA-39142 incompatible version” while importing the datapump dump file generated from a higher version Database into a lower version Database.

Many people has suggested me to upgrade the target also to the source version and some said it’s not at all possible to import a higher version dump into a lower version Database.

Thanks to the datapump team @ Oracle. they’ve given an excellent option for the expdp command. that’s called ‘version‘.

below is the computability matrix for EXPDP.

    Data Pump dumpfile compatibility.

    Export              Use Export Data Pump parameter VERSION=...
      From                 if dumpfile needs to be imported into a
    Source                Target Database with compatibility level
  Database        (value of init.ora/spfile parameter COMPATIBLE):
      With
COMPATIBLE     9.2.0.x.0    10.1.0.x.0    10.2.0.x.0    11.1.0.x.0    11.2.0.x.0
---------- ------------- ------------- ------------- ------------- -------------
10.1.0.x.0   VERSION=9.2             -             -             -             -
---------- ------------- ------------- ------------- ------------- -------------
10.2.0.x.0   VERSION=9.2  VERSION=10.1             -             -             -
---------- ------------- ------------- ------------- ------------- -------------
11.1.0.x.0   VERSION=9.2  VERSION=10.1  VERSION=10.2             -             -
---------- ------------- ------------- ------------- ------------- -------------
11.2.0.x.0   VERSION=9.2  VERSION=10.1  VERSION=10.2  VERSION=11.1             -
---------- ------------- ------------- ------------- ------------- -------------

So what is not possible then?

Connection to the Database with higher RDBMS version through a client (IMPDP binary) of lower version. See below for the compatibility matrix of that

Data Pump client compatibility.
===============================

   expdp and  Connecting to Database version
impdp client     10gR1      10gR2      11gR1      11gR2
     version  10.1.0.x   10.2.0.x   11.1.0.x   11.2.0.x
----------- ---------- ---------- ---------- ----------
   10.1.0.x  supported  supported  supported  supported
   10.2.0.x         no  supported  supported  supported
   11.1.0.x         no         no  supported  supported
   11.2.0.x         no         no         no  supported

 

Hope this helps.

Regards,

CSM

 

How to send your jobs to background???

Hi,

Hope all of you are doing good.

This time I want to talk about some UNIX stuff rather our usual discussion.

Lot of us might get with a situation to send our already running jobs, like impdp/imp etc…, into background because of various reasons.

Here’s the way to do that

Press CTRL+Z in the session your job is running

and type ‘bg’ to send it to background

But In AIX based systems I face some problems in doing that through these commands. After lot of struggle and search, I found a way to do that in AIX based machines

Just do

From another session, Identify the PID/SPID fo the process you want to send background

Do a ‘kill -s STOP PID’ ; This will pause your process execution.

Now give ‘kill -s CONT PID’ to continue the process in background. Alternatively you can do a ‘bg’ in the session where your process was running to start it in background.

 

For a full list of kill signals, see the table below

Number Name Description Used for
0 SIGNULL Null Check access to pid
1 SIGHUP Hangup Terminate; can be trapped
2 SIGINT Interrupt Terminate; can be trapped
3 SIGQUIT Quit Terminate with core dump; can be
9 SIGKILL Kill Forced termination; cannot be trapped
15 SIGTERM Terminate Terminate; can be trapped
24 SIGSTOP Stop Pause the process; cannot be trapped
25 SIGTSTP Terminal stop Pause the process; can be
26 SIGCONT Continue Run a stopped process

 

Hope this helps.

I appreciate your feedback/comments.

 

Regards,

CSM