Vineet's Oracle Blog Spot: 2011

Understanding Joins

Joins are statements that retrieve data from more than one table. A join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a join condition in the WHERE clause. In a join, one row set is called inner, and the other is called outer.

This section discusses:

· Nested Loop Joins

· Hash Joins

· Sort Merge Joins

Nested Loop Joins

Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.

It is very important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably.

Internally Nested loop works this way

In this algorithm, an outer loop is formed which consists of few entries and then for each entry, and inner loop is processed.

Ex:

Select tab1.*, tab2.* from tabl, tab2 where tabl.col1=tab2.col2;

It is processed like:

For i in (select * from tab1) loop
For j in (select * from tab2 where col2=i.col1) loop
Display results;
End loop;
End loop;

The Steps involved in doing nested loop are

a) Identify outer (driving) table

b) Assign inner (driven) table to outer table.

c) For every row of outer table, access the rows of inner table.

NESTED LOOPS
outer_loop
inner_loop

When optimizer uses nested loops?

Optimizer uses nested loop when we are joining tables containing small number of rows with an efficient driving condition. It is important to have an index on column of inner join table as this table is probed every time for a new value from outer table.

The optimizer uses nested loop joins when joining small number of rows, with a good driving condition between the two tables. You drive from the outer loop to the inner loop, so the order of tables in the execution plan is important.

The outer loop is the driving row source. It produces a set of rows for driving the join condition. The row source can be a table accessed using an index scan or a full table scan. Also, the rows can be produced from any other operation. For example, the output from a nested loop join can be used as a row source for another nested loop join.

Optimizer may not use nested loop in case:

No of rows of both the table is quite high
Inner query always results in same set of records
The access path of inner table is independent of data coming from outer table.

Note: You will see more use of nested loop when using FIRST_ROWS optimizer mode as it works on model of showing instantaneous results to user as they are fetched. There is no need for selecting caching any data before it is returned to user.

To check how optimizer decides the nested loop. Follow test plan. All test plan are tested on Oracle version 10.2.

SQL> Create table e as Select * from employees;

Table created.

SQL> Create table d as select * from departments;

Table created.

SQL> Create index e_deptno on e(department_id);

Index created

For better execution plan -gather stats on tables and index.

SQL> exec dbms_stats.gather_table_stats('HR','D')

PL/SQL procedure successfully completed

One may set it artificially as wellJ

Set artificial stats for E:

SQL> exec dbms_stats.set_table_stats(ownname => 'HR', tabname => 'E', numrows => 100,

numblks => 100, avgrlen => 124);

PL/SQL procedure successfully completed.

Set artificial stats for E_DEPTNO index

SQL> exec dbms_stats.set_index_stats(ownname => 'HR', indname => 'E_DEPTNO', numrows =>100, numlblks => 10);

PL/SQL procedure successfully completed.

With less number of rows we will see optimizer probes nested loop

SQL>explain plan for select e.first_name,d.department_name from e, d where e.department_id=d.department_id;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3204653704

--------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 100 | 4100 | 6 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| E | 4 | 100 | 1 (0)| 00:00:01 |

| 2 | NESTED LOOPS | | 100 | 4100 | 6 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL | D | 27 | 432 | 3 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------

|* 4 | INDEX RANGE SCAN | E_DEPTNO | 4 | | 0 (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------------------

4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

Looking at execution plan its clear the query is using nested loop.

Hash Joins

Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.

This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

In simpler terms it works like

Build phase

For each row RW1 in small (left/build) table loop
Calculate hash value on RW1 join key
Insert RW1 in appropriate hash bucket.
End loop;

Probe Phase

For each row RW2 in big (right/probe) table loop
Calculate the hash value on RW2 join key
For each row RW1 in hash table loop
If RW1 joins with RW2
Return RW1, RW2
End loop;
End loop;

When the Optimizer Uses Hash Joins

The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the

following conditions are true:

· A large amount of data needs to be joined.

· A large fraction of a small table needs to be joined.

Note: You may see more hash joins used with ALL_ROWS optimizer mode, because it works on model of showing results after all the rows of at least one of the tables are hashed in hash table.

To check how optimizer decides the hash loop. Go through the test plan.

Set artificial stats for E , D table and index(setting higher number of rows 1000000)

SQL> exec dbms_stats.set_table_stats(ownname => 'HR', tabname => 'E', numrows => 1000000, numblks => 10000, avgrlen => 124);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_index_stats(ownname => 'HR', indname => 'E_DEPTNO', numrows => 1000000, numlblks => 1000);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_table_stats(ownname => 'HR', tabname => 'D', numrows => 1000000,numblks => 10000 , avgrlen => 124);

PL/SQL procedure successfully completed.

Now we have 1000000 number of rows in E and D table both and index on E(DEPTNO) reflects the same.

SQL>explain plan for select e.first_name,d.department_name from e, d where e.department_id=d.department_id;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------

Plan hash value: 1127375450

--------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 37G| 1414G| | 663K (99)| 02:12:45 |

|* 1 | HASH JOIN | | 37G| 1414G| 26M| 663K (99)| 02:12:45 |

| 2 | TABLE ACCESS FULL| D | 1000K| 15M| | 2232 (2)| 00:00:27 |

| 3 | TABLE ACCESS FULL| E | 1000K| 23M| | 2264 (4)| 00:00:28 |

---------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

-------------------------------------------------------------------------------------------------------------------

1 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

Clearly we can see it is Using Hash Join.

Sort Merge Joins

Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:

· The row sources are sorted already.

· A sort operation does not have to be done.

However, if a sort-merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.

Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a non-equality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets.

You cannot use hash joins unless there is an equality condition.

In a merge join, there is no concept of a driving table. The join consists of two steps:

1. Sort join operation: Both the inputs are sorted on the join key.

2. Merge join operation: The sorted lists are merged together.

If the input is already sorted by the join column, then a sort join operation is not performed for that row source.

The full operation is done in two parts:

Sort join operation

get first row RW1 from input1
get first row RW2 from input2.

Merge join operation

while not at the end of either input loop
if RW1 joins with RW2
get next row R2 from input 2
return (RW1, RW2)
else if RW1 < style=""> get next row RW1 from input 1
else
get next row RW2 from input 2
end loop

When the Optimizer Uses Sort Merge Joins

The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:

· The join condition between two tables is not an equi-join.

· Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.

Note: Sort merge join can be seen with both ALL_ROWS and FIRST_ROWS optimizer hint because it works on a model of first sorting both the data sources and then start returning the results. So if the data set is large and you have FIRST_ROWS as optimizer goal, optimizer may prefer sort merge join over nested loop because of large data. And if you have ALL_ROWS as optimizer goal and if any inequality condition is used the SQL, optimizer may use sort-merge join over hash join

Now to test merge join, set subside number of rows and perform some sorting.

Num of rows say e=10000 and d=1000

SQL> exec dbms_stats.set_table_stats(ownname => 'HR', tabname => 'E', numrows => 10000, numblks => 1000, avgrlen => 124);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_index_stats(ownname => 'HR', indname => 'E_DEPTNO', numrows => 10000, numlblks => 100);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_table_stats(ownname => 'HR', tabname => 'D', numrows => 1000, numblks => 100, avgrlen => 124);

PL/SQL procedure successfully completed.

SQL>explain plan for select e.first_name,d.department_name from e, d where e.department_id=d.department_id order by e.department_id;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------

Plan hash value: 915894881

---------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 370K| 14M| 136 (7)| 00:00:02 |

| 1 | MERGE JOIN | | 370K| 14M| 136 (7)| 00:00:02 |

| 2 | TABLE ACCESS BY INDEX ROWID| E | 10000 | 244K| 104 (1)| 00:00:02 |

| 3 | INDEX FULL SCAN | E_DEPTNO | 10000 | | 100 (0)| 00:00:02 |

|* 4 | SORT JOIN | | 1000 | 16000 | 25 (4)| 00:00:01 |

| 5 | TABLE ACCESS FULL | D | 1000 | 16000 | 24 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

--------------------------------------------------------------------------------------------------------------------------------

4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

filter("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")So above query uses Merger Join.

When I discussed nested loop above, I used a term called driving table. So, what is driving table all about?

The 'driving' table is the table we will join FROM -- that is JOIN TO other tables. For example, lets say you have the query:

SQL>select * from emp, dept where emp.deptno = dept.deptno;

In this case the driving table might be DEPT, we would fetch rows from DEPT and then find the rows in EMP that match. DEPT is the driving table.

The choice of a driving table made using many factors:

Suppose PUBLICATIONS has PUB_ID (the primay key) and PUB_TITLE, and there are 100 publications. Now, suppose DOCUMENTS has DOC_ID (the primary key),DOC_TITLE, and PUB_ID (foreign key to PUBLICATIONS.PUB_ID). Suppose there are 1,000,000 documents.

So, the PUBLICATIONS table is much smaller, it should be the driving table, right?

No, not necessarily. What if the query is something like:

SQL>select pub.pub_title, doc.doc_id, doc.doc_title from documents doc, publications pub
where doc.pub_id = pub.pub_id and doc.doc_id = :doc_id;

So, in this case, do you want to drive from the PUBLICATIONS table? No, there are 100 rows that would be returned, based on the predicates in the where clause. If you drive from DOCUMENTS, how many rows will be returned? Either 1 or 0, due to the 'doc.doc_id = :doc_id' predicate. So, in this case, DOCUMENTS is your driving table, even though it has 1,000,000 rows and PUBLICATIONS only has 100. In the context of this query, DOCUMENTS is much more selective!

In general, you want to look at the tables in the join, look at filtering predicates that will go against each table, consider if there is an index driving that filter, and determine which table will produce the smallest row-source with the least amount of work.

Background:

Whenever a session seems stuck or doesn’t comes out many DBA’s /Developers have a habit to cancel their session (CTRL C) or kill the session. My bad, I did the same when I was testing rebuilding of an index.

Sql_Sesssion1> Select object_name,object_id,object_type From user_objects;

OBJECT_NAME OBJECT_ID OBJECT_TYPE

-----------------------------------------------------------------------------------------------------------------------------------------------------

IDX 61837 INDEX

ALL_OBJ 61836 TABLE

In session 2:

Sql_Sesssion2> Alter index idx rebuild online;

CTRL+C <------

And I got this

ORA-01013 user requested cancel of current operation.

But when I checked my 1st session again with same query I got more number of objects present in my schema (Journal table and IOT index).

Sql_Sesssion1> Select object_name, object_id, object_type From user_objects;

OBJECT_NAME OBJECT_ID OBJECT_TYPE

--------------------------------------------------------------------------------------------------------------------------------------

IDX 61837 INDEX

ALL_OBJ 61836 TABLE

SYS_IOT_TOP_61851 61852 INDEX

SYS_JOURNAL_61837 61851 TABLE

Wondering why these objects get created???

So this post is of mine is intended to describe online index rebuilding internals and a brief of journal table.

The way online index build (OIB) works is by creating an IOT journal table to keep track of changes while the rebuild is in progress and merge all the changes from journal table to complete index build operation.

Journal table is the index organized table and their name will essentially be SYS_JOURNAL_<index object_id>. It gets dropped automatically when the job completes successfully or cleans or cancelled successfully.

Refer above query, IDX object_id is 61837 and Journal table named as SYS_JOURNAL_61837

The structure of Journal table looks like

Sql_Sesssion1> DESC SYS_JOURNAL_61837

Name Null? Type

-----------------------------------------------------------------------------------------------------------

C0 NOT NULL NUMBER

C1 NOT NULL NUMBER

OPCODE CHAR(1)

PARTNO NUMBER

RID NOT NULL ROWID

Where

"OPCODE" column represents the type of operation like "I" for Insert and "D" for Delete. Any update operation of index key columns would be converted to "DELETE" and "INSERT" in the journal table.

"PARTNO" column represents partition number of the underlying table.

Internal Processing:

OIB will get in the DML queue to lock the table exclusively while preventing the new DML's to go through, once all the active transactions (ones which were initiated before the OIB) are completed, OIB will create the journal IOT table and release the exclusive table lock (it'll still keep the share lock on the table to prevent any other DDL's) for DML's to continue.

As part of journal table creation, Oracle would create an internal trigger on the primary table to record all the changes to the journal table. Along with using all the index columns, journal table will add "ROWID" to that list to make it as primary key.

Among all the changes to a given row for any of the index key columns, only the most recent change for that record is visible in the journal table.

While rest of the user sessions populate journal table with the ongoing changes for index key columns, OIB session would be reading the table in consistent mode (as of the time journal table is created) to create the index followed by the final merge operation.

During the merge process, Oracle will read the journal table (IOT) leaf blocks from left to right to merge those changes with the index being built. As the journal table leaf block changes are applied, once a leaf block is fully consumed, its reference will be deleted from the branch block.

This process will continue until all leaf blocks are consumed and when it comes to the last leaf block, Oracle would stop all the DML's again to do the final merge and drop the journal table before releasing the mode 6 exclusive table lock.

As each leaf block is consumed, Oracle would mark each entry as deleted. If more DML's happen while Oracle is doing the merge, it'll do one more pass of walking through the leaf blocks, this process continues until the merge process is all done.

Please Note: In 11g, major changes were introduced to handle online index rebuilding. The above post is meant for previous version to 11g.

If you find this article interesting then please provide your feedback through comments.

Vineet's Oracle Blog Spot

Sunday, 16 October 2011

Understanding Joins

Hash Joins

Sunday, 25 September 2011

Internals Of Index Online Building

About Me