Tuesday, 26 April 2016

Oracle Data-Pump : Optimising LOB Export Performance



LOBs as we all know that are used to store large, unstructured data, such as video, audio, photo images etc. With a LOB you can store up to 4 Gigabytes of data. They are similar to a LONG or LONG RAW but differ from them in quite a few ways.

Category that LOB can think of is 
Simple structured data
Complex structured data
Semi-structured data
Unstructured data

LOB are effective but it comes with performance impact in few of the scenario.

Lately there was a task related to taking a backup of LOB table<non partitioned.> of size around 41 GB where LOB itself took around 33 GB. There are many options to do backup of the table and team started with EXPDP approach and it took very long time around 3 hours and this was not acceptable.

Lob won’t use parallel at all so there was no point in using parallel degree in expdp command.

Since the activity was critical and clock was ticking, somehow I managed to deduce the way to do it in concurrent way i.e.  Run the data pump in chunks and execute in concurrent way so that each data pump will work on small chunk rather than full volume. Execution in concurrent way in any case will reduce the time.

Here is the very simple and effective script that I designed and to my surprise it reduced the time line from 3 Hours to 10-15 minutes.

What I did is that I logically divided the table based on rowids block using DBMS function in to 10 chunks and these chunks are internally balanced that is data is divided almost equally among the chunks.

Below snippet executed in 10 threads i.e. 10 data pump threads and each one is getting executed concurrently in background. Resulted in 10 different dumps and took just 10-20 minutes for complete export.


#!/bin/bash
chunk=10
for ((i=0;i<=9;i++));
do
expdp USERNAME/Password@DB_NAME TABLES=LOB_TEST QUERY=LOB_TEST:\"where mod\(dbms_rowid.rowid_block_number\(rowid\)\, ${chunk}\) = ${i}\" directory=DMP dumpfile=lob_test_${i}.dmp logfile= log_test_${i}.log &
   echo $i
done 

Since there are 10 different dumps; import will happen in chunk only and it will reduce the time during import activity as well. 


Off course there are other ways to do it like<in relation to expdp>; 

a) DBMS_PARALLEL_EXECUTE where we can define the task and generate the rowids and use those rowids during the export.
b) If table is partitioned then do take the backup partition wise in concurrent mode.
c) Use dbms_rowid.rowid_create.
d) Use Secure file for storing the lob rather than using basic file.

Tuesday, 19 April 2016

CLUSTERING FACTOR AND TABLE_CACHED_BLOCKS: ORACLE 12C FEATURE


Week off;  somehow I managed to get time to write on 12C behaviour. Core idea was to gather performance artefacts about concurrent insert scenarios but to surprise I went on to discover clustering factor and its internals. I tested this in 12.1.0.1.0

Have created one table; On initial basis I tried just running regular insert from multiple sessions and to see it how it goes.

create table TEMP_DATA (id number, name varchar2(30)) tablespace POOL_DATA;
  
create sequence TEMP_DATA_seq order;

CREATE OR REPLACE PROCEDURE temp_data_proc AS
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO TEMP_DATA VALUES (TEMP_DATA_seq.NEXTVAL, 'VINEET');
COMMIT;
END LOOP;
END;
/

now from 2 different sessions I triggered the procedure to load the table.

Sess1: exec temp_data_proc;

Sess2: exec temp_data_proc;

Then created index

create index IND_IXX on TEMP_DATA(id);

Gathered the stats and was really surprised to see the clustering factor of Index.

SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IND_IXX';

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEMP_DATA                      IND_IXX                               496     200000            116653

Second Approach: Dropped and recreated all the object as mentioned above and ran the insert procedure from single session one by one i.e.

executed 2 times temp_data_proc in same session; the clustering factor here is way less.


SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IND_IXX';

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEMP_DATA                      IND_IXX                               496     200000               441------------à Drastically reduced.

Drilling down more; and noted that running insert from different session can be a problem as each one work on slight different free list so they might pick different block. While in one session one free list is active and once it is over it picks different free list which automatically deals like you have blocks are accessed in right order.

so while creating Index in case of running multiple insert from different session; oracle will do lot of back and forth while generating clustering factor and hence it generated high clustering factor. In case of same session insert with same volume; blocks were accessed in ordered form and hence it generates less clustering factor.

Index stores rowid and data; now how it increments clustering factor?

Basically rowid is representation of blocks and files etc; so while generating clustering factor it check the reference block and see if it same or different from previous one;
if it is different than it increment it by 1 and so this process go on and on.

So this large clustering factor seems an absurd behaviour as far as concurrent session scenario is concerned;  to come up for this 12C introduced  TABLE_CACHED_BLOCKS ; The purpose of the table_cached_blocks parameter is set using set_table_prefs and it helps to control the computed clustering_factor when running the dbms_stats utility and this column is not present in 11.1.0 to 11.2.0.3; not sure about 11.2.0.4. in Version 11g this basically comes up after applying Oracle patches.

TABLE_CACHED_BLOCKS: default is 1 and max is 255; for my test I used 255 so that there will be a very little scope for clustering factor to go wrong.

TEST case : with TABLE_CACHED_BLOCKS

Drop table  TEMP_DATA;

Drop  sequence TEMP_DATA_seq;

create table TEMP_DATA (id number, name varchar2(30)) tablespace POOL_DATA;
 
  
create sequence TEMP_DATA_seq order;

CREATE OR REPLACE PROCEDURE temp_data_proc AS
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO TEMP_DATA VALUES (TEMP_DATA_seq.NEXTVAL, 'VINEET');
COMMIT;
END LOOP;
END;
/

exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'TEMP_DATA',pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

next have 2 different sessions where we run the procedure concurrently.

Sess1: exec temp_data_proc;

Sess2: exec temp_data_proc;

Then created index again

create index IND_IXX on TEMP_DATA(id);

Gathered the stats and I see clustering factor is quite similar to single session and is optimal.

SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IND_IXX';


TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEMP_DATA                      IND_IXX                               496     200000               468


I Hope you liked it..

Enjoy Learning....