Oracle Exadata & Offloading
New
Year: - New Chapter, New Aspirations. Wishing you all a very Happy New Year.
Lastly I
got a chance to explore Exadata and I went over documentations to understand
how this machine actually works and i must say i was amazed by it’s designed.
So I thought why not to share with you all.
So what
is Exadata all about and what it offers?? Here is a brief about it.
Exadata precisely
“The Database Machine” is high end Oracle system built using Industry standard
Oracle hardware. It primarily consists 3 grids i.e. DB Server/compute grid,
network grid and a storage grid. Basically it’s a great combination of
Intelligent Hardware and smart Software.
Exadata Architecture |
Let’s take each
component one by one in brief:
DB Server/Compute Grid: This grid is actually a Database layer;
this layer is basically made up of multiple SUN servers with running Oracle Db
software. RAC is supported. The
database servers uses ASM to map the storage and ASM is key component of this
software stack. Db Server does have one more component LIBCELL and is linked
with Oracle kernel. This allows Oracle to talk to storage tier via netwrok
based calls instead of reads/writes at OS level. LIBCELL is designed in such a
way that its code know how to interact or requests via iDB.
InfiniBand <Network grid>: It provide a low latency,high throughput
communication link. This layer is responsible for the communication between the
two tiers<Storage and DB> and here it is done via iDB< Intelligent Database protocol>,
which is a network based protocol implemented using InfiniBand. iDB is used to
send requests for data along with metadata about the request (including
predicates) to cellsrv. iDB
is built on Reliable Datagram Sockets (RDS ) protocol and runs over
InfiniBand ZDP (Zero-loss Zero-copy Datagram Protocol). The objective of ZDP is
to eliminate unnecessary copying of blocks.
Storage Grid: Storage consists of multiple servers and
the intelligent part integrated here is the process< Cell
Services(Cellsrv)> that runs as a part of storage server. Cellsrv is a
multi-threaded program that services I/O requests from a database server. Those
requests can be handled by returning processed data or by returning complete
blocks depending in the request. Cellsrv is able to use the metadata to process
the data prior sending result back to client. Such type of scans are called as
Smart Scans.It is not always that Smart scan is used for each and every Db
operation; there are certain situations only for which Smart Scans are
triggered.
Apart from what I
mentioned above there is also a term called Flash Cache;
Storage server also equipped with flash based storage. Oracle point this as an
Exadata Smart Flash Cache. The whole idea here is to gain on sequential read
i.e. single blocks.
There are lot
more per component wise; for more I request you to visit Oracle Docs.
Now lets take
what Exadata offers at sql level:
One of the Key
feature of Exadata is Offloading and this is basically a game changer in my
view; I am really impressed in the way Oracle handles the data volume in
Exadata . Offloading refers
to the key concept of moving the processing from the database servers to the
storage layer. By doing this Oracle gains an advantage on reducing the volume
of data that must be returned to the DB server. Less the volume at DB layer
less will the processing and more we can count on improvement.Smart Scan is
another name of offloading and these terms are more or less interchangeable.
But why
offloading is important: Large volume tends to be a problem for any system and
moving the data from Hardware to DB and processing such large volume is
expected to take time. Let say take this via e.g. With a block size of 8K if
you want say one row then storage end sends whole 8K block and from that block
Oracle software fetches 1 row. Now 8K block can have N number of Odd rows; so
from N number of rows Oracle fetched 1 row.
Now suppose you
have millions of rows then you can imagine how many blocks will get transferred
to satify large no of rows and this certainly becomes a bottleneck and this is where Exadata pinch in; Exadata
offloading is designed to eliminate unnecessary data between the two layers.
With Exadata
Offloading we usually achieve:
(i)
Less
volume of data gets transferred to Db server from Storage End
(ii)
Less
physical IO i.e less disk access.
(iii)
CPU
utilization is less.
Offloading: Can be achieved by
(i)
Predicate
Filtering
(ii)
Storage
Indexes
(iii)
Column
Projection
But prior going
over each; let see how components interacts when SQL initiates an IO in Exadata
machine; here I try to come up with block diagram to show its end to end
communication.
Exadata SQL Processing |
To understand offloading behaviour lets do a practical to see
how this offloading works and whether sql’s are getting benefited or not not.
In Exadata there are few Oracle parameters that deals with
offloading; which when tweaked will give you license to play in terms of
Offloading. Here I am concentrating on only on cell_offload_processing which is
TRUE by default and is session modifiable.
NAME_COL_PLUS_SHOW_PARAM
TYPE
VALUE_COL_PLUS_SHOW_PARAM
--------------------------------------------------------------------------------
----------- ----------------------------------------
cell_offload_compaction
string ADAPTIVE
cell_offload_decryption
boolean TRUE
cell_offload_parameters
string
cell_offload_plan_display
string AUTO
cell_offload_processing boolean TRUE
cell_offloadgroup_name
string
So lets see with < cell_offload_processing =TRUE> how much time below query
takes; table is having 485M rows.
/* Formatted on 2017/01/11
18:56 (Formatter Plus v4.8.8) */
SELECT COUNT (*)
FROM
smart_test
WHERE (crnt_f = 'Y');
Elapsed:
00:00:26.33 ----------------à excellent
elapsed time; in non exadata machine query is expected to take more.
SQL
SQL_PLAN_LINE_ID
BLOCKING_SESSION BLOCKING_SESSION_SERIAL# BLOCKING_SE Id SAMPLE_TIME EVT CURRENT_OBJ#
----------------
---------------- ------------------------ ----------- ---------------
------------------------------ ------------------------- ------------
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.33.240 AM cell smart table scan 114988
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.34.250 AM cell smart table scan 114989
3 UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.35.250 AM cell smart table scan 114991
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.36.250 AM cell smart table scan 114992
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.37.250 AM cell smart table scan 114994
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.38.250 AM cell smart table scan 114995
3 NOT
IN WAIT 47ztspbv68hu0 11-JAN-17
08.23.39.250 AM ON CPU 114996
3 NOT IN WAIT
47ztspbv68hu0 11-JAN-17 08.23.40.250
AM ON CPU 114998
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.41.250 AM cell smart table scan 115000
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.42.250 AM cell smart table scan 115001
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.43.250 AM cell smart table scan 115003
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.44.250 AM cell smart table scan 115005
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.45.260 AM cell smart table scan 115006
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.46.260 AM cell smart table scan 115008
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.47.260 AM cell smart table scan 115010
3 UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.48.260 AM cell smart table scan 115011
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.49.290 AM cell smart table scan 115013
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.50.290 AM cell smart table scan 115015
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.51.290 AM cell smart table scan 115016
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.52.290 AM cell smart table scan 115018
3 UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.53.290 AM cell smart table scan 115020
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.54.300 AM cell smart table scan 115021
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.55.316 AM cell smart table scan 115023
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.56.316 AM cell smart table scan 115025
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.57.316 AM cell smart table scan 115027
3
UNKNOWN 47ztspbv68hu0 11-JAN-17 08.23.58.316 AM cell smart table scan 115028
You might have noticed by now; query is
waited on new event ”cell smart table scan” which shows offloading is being done. Oracle uses this event to account for time
spent waiting for Full Table Scans that are Offloaded. Its occurrence can be
used to verify whether a statement benefited from Offloading or not. Also to notice
here that query visited table in a restricted manner i.e not to much visit.
Lets take a look
on execution plan; you can see ID 3 operation you can see new access method
i.e. ‘TABLE ACCESS STORAGE FULL’ which is very specific to Exadata only. It is worth to note
here that this doesn’t mean smart scan is used; Storage keyword here means that
this execution plan is Exadata storage aware.
--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)|
E-Time | Pstart| Pstop | OMem |
1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT |
| |
| 2796K(100)| | |
| | | |
| 1 |
SORT AGGREGATE | | 1 |
2 | | | |
| | | |
| 2 |
PARTITION RANGE ALL | | 61M|
117M| 2796K (1)| 00:01:50 | 1 |
67 | | | |
|* 3 |
TABLE ACCESS STORAGE FULL| SMART_TEST | 61M|
117M| 2796K (1)| 00:01:50 | 1 |
67 | 1025K| 1025K|
14M (0)|
--------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified
by operation id):
---------------------------------------------------
3 -
storage("CRNT_F"='Y')
filter("CRNT_F"='Y')
But you can see
storage clause in predicate information< storage("CRNT_F"='Y')> which shows smart scan is used and Storage grid handled
all the pain i.e. major filtering is done at the storage level and DB layer had
to do minimal activity<only those blocks which are not handled by cells>. This is basically Predicate Filtering.
Lets see how it
works when offloading process is disabled. Exadata will behave like non exadata
machine.
SQL> alter session set cell_offload_processing=false;
-------------------à disable
offloading process;
/* Formatted on 2017/01/11
18:56 (Formatter Plus v4.8.8) */
SELECT COUNT (*)
FROM
smart_test
WHERE (crnt_f = 'Y');
Elapsed:
00:06:08.59--------àsee after disabling; elapsed time is in minutes from few
seconds.
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)|
E-Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | |
| 2796K(100)| | |
|
|
1 | SORT AGGREGATE | | 1 |
2 | | | |
|
| 2 |
PARTITION RANGE ALL | | 61M|
117M| 2796K (1)| 00:01:50 | 1 |
67 |
|*
3 | TABLE ACCESS STORAGE FULL|
SMART_TEST | 61M| 117M|
2796K (1)| 00:01:50 | 1 |
67 |
-----------------------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
3 -
filter("CRNT_F"='Y')
Look at the predicate storage filter disappeared and that it
means no smart scan and no offloading; here all the blocks are shipped to
Oracle Db layer from storage layer and all the major filtering is done at Db
layer instead of storage layer.
You can also see it from wait events; that query waited on
direct path read and visited the table many times as compared to smart scan
offloading as I mentioned above.
SQL
SQL_PLAN_LINE_ID
BLOCKING_SESSION BLOCKING_SESSION_SERIAL# BLOCKING_SE Id
SAMPLE_TIME
EVT
CURRENT_OBJ#
----------------
---------------- ------------------------ ----------- ---------------
------------------------------ ------------------------- ------------
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.29.53.916 AM direct path read 114992
3 NOT IN WAIT
6fxj1rub84bsq 11-JAN-17 08.29.54.916
AM ON CPU 114992
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.29.55.933 AM ON CPU 114993
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.29.56.933 AM direct path read 114993
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.29.57.933 AM ON CPU 114993
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.29.58.933 AM ON CPU 114993
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.29.59.933 AM direct path read 114993
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.00.933 AM ON CPU 114994
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.01.933 AM direct path read 114994
3 NOT IN WAIT
6fxj1rub84bsq 11-JAN-17 08.30.02.943
AM ON CPU 114994
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.03.943 AM direct path read 114994
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.04.943 AM ON CPU 114995
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.05.943 AM ON CPU 114995
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.06.943 AM direct path read 114995
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.07.943 AM direct path read 114996
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.08.943 AM direct path read 114996
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.09.953 AM direct path read 114996
3 NOT IN WAIT 6fxj1rub84bsq 11-JAN-17 08.30.10.953 AM ON CPU 114996
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.11.953 AM ON CPU 114997
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.12.953 AM direct path read 114997
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.13.953 AM direct path read 114997
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.14.953 AM direct path read 114997
3 UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.15.953 AM direct path read 114998
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.16.953 AM direct path read 114998
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.17.953 AM ON CPU 114998
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.18.953 AM direct path read 114999
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.19.963 AM direct path read 114999
3 UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.20.963 AM direct path read 114999
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.21.963 AM ON CPU 114999
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.22.963 AM direct path read 115000
3 UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.23.963 AM direct path read 115000
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.24.963 AM direct path read 115000
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.25.963 AM direct path read 115001
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.26.963 AM direct path read 115001
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.27.963 AM ON CPU 115001
3 NOT IN WAIT 6fxj1rub84bsq 11-JAN-17 08.30.28.973 AM ON CPU 115001
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.29.973 AM direct path read 115002
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.30.973 AM direct path read 115002
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.31.973 AM direct path read 115002
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.32.973 AM ON CPU 115002
3 UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.33.973 AM direct path read 115003
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.34.973 AM direct path read 115003
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.35.973 AM direct path read 115003
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.36.973 AM ON CPU 115003
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.37.983 AM ON CPU 115004
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.38.983 AM direct path read 115004
3
UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.39.983 AM direct path read 115004
3 NOT
IN WAIT 6fxj1rub84bsq 11-JAN-17
08.30.40.983 AM ON CPU 115004
3 UNKNOWN 6fxj1rub84bsq 11-JAN-17 08.30.41.983 AM direct path read 115005
Storage Indexes: Storage Indexes
are not like B Tree indexes; infact there is no relationship between
them.Storage Index in my view is like anti index which is designed to eliminate
the disk IO as they are here to identify the location where requested data is
not present rather than B tree index which are designed to pinpoint the location
of requested data. Storage Index structure are of 1MB and are stored in memory
and never written to disk. Storage index stores minimum and maximum column
values for disk storage units; there is one FLAG sort of column which denotes whether
any of the records in a storage region contain nulls.
Since Smart Scans pass the query predicates to the storage
servers, and Storage Indexes contain a map of values in each 1MB storage
region, any region that can’t possibly contain a matching row can be eliminated
without ever being read.
Let me show you via
diagram what I am trying to say
Figure 1: Denotes
employee table; Figure 2 is Storage
Index created on Salary predicate; you can see 3 columns minimum,maximum and
FLAG column.
Suppose you executed query { SELECT EMPID from EMPLOYEE WHERE
SALARY > 5000 }on exadata machine; smart scan will pass the query predicate
to storage server and storage index here will help the storage to skip the
first 2 part of region as they do not have maximum values that are high enough
to contain any records that will satisfy the query predicate. Therefore, those
storage regions will not be read from disk only last row<region> is
satisfying the requested data so only that will be read.
Storage index Map |
cell physical IO bytes saved by storage index: is the
statistics which shows how much bytes were saved. For initial run there wont be
any saving as storage index map will be created on that execution; subsequent
execution will save and shows how much it saved.
SQL> select avg(id)
from smart_test where id2 is null;
NAME VALUE
---------------------------------------------
---------------
cell physical IO
bytes saved by storage index 0
SQL> select
avg(id) from smart_test where id2 is null;
NAME VALUE
---------------------------------------------
---------------
cell physical IO
bytes saved by storage index 5984949248
Column Projection: Column
projection also helps in terms of achieving good performance. Columns that are
part of select query are not returned only; there are join columns as well
which gets returned.But again this is not something which is very unique to
exadata. In non exadata machines also column projection is famous.
Ø select sum(s. ID2),count((s. ID3)) from
smart_test s, smart_test s2
where s. ID = s2. ID and s. ID4 > 0 ;
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0)
COUNT("S"." ID3")[22], SUM("S"."
ID2")[22]
2 - (#keys=1; rowset=200)
"S"." ID2"[NUMBER,22], "S"."
ID3"[NUMBER,22]
3 -
"S2"." ID"[NUMBER,22]
4 -
"S2"." ID"[NUMBER,22]
5 - (rowset=200) "S"." ID"[NUMBER,22],
"S"." ID2"[NUMBER,22],
"S"." ID3"[NUMBER,22]
6 - (rowset=200)
"S"." ID"[NUMBER,22], "S"."
ID2"[NUMBER,22],
"S"." ID3"[NUMBER,22]
You can see join columns and select
columns are part of projection but not all columns selected i.e. ID4 column is
not part of projection so this is excellent in terms of performance gain.
I hope you like this brief cover on
Exadata. For more I request you to go over documentation as this machine has
lot to offer.
Enjoy Learning!!!!
3 comments:
Nice overview!!
Thanks a lot Somesh.
Thanks for the comments.
Post a Comment