Sunday, 19 July 2020

Huge CPU load due to high %SYS usage: Oracle

Huge CPU load due to high %SYS usage: Oracle

Came across one of the interesting performance issues related to System CPU(%System) going very high< better to call Kernel CPU> on Oracle Db server box.After lot of hard work and tracing, 

It was realized that the Oracle as a part of system call calling "acpi_pm_read".

Oracle database relies  on gettimeofday(2) for timing everything from I/O calls to latch sleeps.The preferred clock source is the Time Stamp Counter (TSC) but in my case it was acpi_pm_read.  

Such context switches consumed lot of CPU and hence why the %Kernel CPU went high.

Just a brief: 

Multiprocessor systems such as NUMA or SMP have multiple instances of clock sources. The way clocks interact among themselves and the way they react to system events, such as CPU frequency scaling or entering energy economy modes, determine whether they are suitable clock sources for the real-time kernel. 
During boot time the kernel discovers the available clock sources and selects one to use. The preferred clock source is the Time Stamp Counter (TSC), but if it is not available the High Precision Event Timer (HPET) is the second best option. However, not all systems have HPET clocks and some HPET clocks can be unreliable. 

In the absence of TSC and HPET, other options include the ACPI Power Management Timer (ACPI_PM), the Programmable Interval Timer (PIT) and the Real Time Clock (RTC). 

The last two options are either costly to read or have a low resolution (time granularity), therefore they are sub-optimal for the real-time kernel. 

In my case as i mentioned the read call shows acpi_pm_read call which means it was using the non-preferred timer< ACPI Power Management Timer (ACPI_PM)>. 

Best of the lot is TSC and most of the preferred system uses the same.

You can always see what your hardware system is using by below command. Tsc is the likely outcome.

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

Once set %Kernel CPU went back normal.

Note: This was my observation, It may not be same for everyone. So be careful.