May 29, 2015

Monitor load on database servers - using iostat & vmstat

The iostat command is used to monitor the load on server input/output (I/O) devices by observing the time the devices are active compared to the average transfer rate of the device. iostat generates several report lines that can be used to monitor and subsequently change the system configuration to better balance the I/O workload between physical disk devices.

The initial report detail lines generated by iostat provide statistics encompassing the time since the system was last booted. Subsequent sets of detail lines cover the time since the previous report interval.
Each set of report lines starts with a header row with CPU statistics which represents the CPU usage across all processors.  Following the CPU information, a device header row is displayed with subsequent detail lines of statistics for each device in the system.

The following example shows the invocation of iostat specifying a three second interval or delay with a total of two samplings or counts:





























For the average CPU report, %user, %nice, %iowait, and %idle are defined the same as they were in the mpstat command output.  One remaining piece of information is defined as:
* %sys: The percentage of processor utilization occurring at the system kernel level.

For the device utilization report:

* device: The device name as listed in the /dev directory is displayed.  These device names are mapped to mount points in the file /etc/fstab and are also listed in the output of the df command.
* tps: The number of transfers (I/O requests) per second issued to the device.
* blk_read/s: The number of blocks per second read from the device.
* blk_wrtn/s: The number of blocks per second written to the device.
* blk_read: The total number of blocks read.
* blk_wrtn: The total number of blocks written.
This information can assist in the determination of which devices are more heavily used than others and perhaps help with the determination of how to better distribute data to balance the workload.

More Examples on iostat:
$ iostat 
Display a single history since boot report for all CPU and Devices.

$ iostat -d 2 
Display a continuous device report at two second intervals.

$ iostat -d 2 6 
Display six reports at two second intervals for all devices.

$ iostat -x hda hdb 2 6 
Display six reports of extended statistics at two second intervals for devices hda and hdb.

$ iostat -p sda 2 6 
Display six reports at two second intervals for device sda and all its partitions (sda1, etc.)

$ iostat -ct 2
Display a continues two seconds interval report of CPU utimization. Use iostat -tT 2 in AIX environments.

Displaying Virtual Memory Statistics:

The vmstat command displays information about processes, memory, paging, block IO, and different levels of CPU activity. As with iostat, the first detail lines produce report averages since the last reboot. Subsequent detail lines report information using the interval specified on the command line.
As with the other commands in this section, the vmstat command is driven by delay and count options that determine the time interval between report lines and the totals number of intervals to be reported.













The Linux man page for vmstat defines the fields displayed as follows:
* procs
* r: The number of processes waiting for run time
* b: The number of processes in uninterruptible sleep, which means they are waiting on a resource
* memory
* swpd: Virtual memory used
* free: Idle memory
* buff: Amount of memory used as buffers
* cache: Current memory used as cache
* swap
* si: Memory swapped in per second from disk
* so: Memory swapped out per second to disk
* io
* bi: Blocks per second received from a block device
* bo: Blocks per second sent to a block device
* system
* in:.Number of interrupts per second, including the clock
* cs: Number of context switches per second
* cpu: These statistics are percentages of total CPU time:
* us: User time spent running non-kernel code, includes nice time
* sy: System time spent running kernel code
* id: Idle time
* wa: Wait time spent waiting for I/O

The vmstat information can be invaluable when studying resource utilization trends.  Here are a few examples of how vmstat output can be interpreted:

If over time the run queue value, procs-r, remains consistently higher than the number of processors in the server and CPU idle time is low, the system is CPU bound and can benefit from the addition of more and/or faster processors. Alternatively a high number displayed in the procs-b column also indicates a bottleneck, but one where processes are waiting on other resources.

If the virtual memory used (memory-swpd) remains high and the free memory (memory-free) remains low, then the system is memory constrained and will benefit from additional RAM.

Consistently high I/O rates paired with consistently low CPU utilization (cpu-us) indicates an I/O bound system that could benefit from a highly buffered disk array or possibly solid-state disk.

Click here to see more about vmstat, iostat, mpstat and nmon reports from my earlier post.

Understanding %CPU while running top command

Most of Oracle DBAs or Linux administrators have following common questions on "TOP" command.

What does %CPU means when "top" is running?
If %CPU for my application as 400 or 500 most of the times, then how I will interpret? What number is a high number?

To answer this, you need to collect some statistics like Total CPU and cores available etc.

e.g.,

$ lscpu 

gives me below output:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Stepping:              7
CPU MHz:               2599.928
BogoMIPS:              5199.94
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31


so, we can understad like belwo:

%CPU -- CPU Usage : The percentage of your CPU that is being used by the process. By default, top displays this as a percentage of a single CPU. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use,  top will show a CPU use of 180%. See here for more information. You can  toggle this behavior by hitting "Shift+i" while top is running to show the overall percentage of available CPUs in use.

According to your lscpu output:

You have 32 cores (CPU(s)) in total.
You have 2 physical sockets (Socket(s)), each contains 1 physical processor.
Each processor of yours has 8 physical cores (Core(s) per socket) inside, which means you have 8 * 2 = 16 real cores.
Each real core can have 2 threads (Thread(s) per core), which means you have real cores * threads = 16 * 2 = 32 cores in total.
So you have 16 virtual cores and 16 real cores.

It means you have a lot of cores. One core at max is 100%. So the highest it can be is number_of_cores × 100%.

If you want to see core-wise CPU statistics, then below command can be used:

$ mpstat -P ALL 1

It shows how much each core is busy and it updates automatically each second. The output would be something like this
(on a quad-core processor):

10:54:41 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:54:42 PM  all    8.20    0.12    0.75    0.00    0.00    0.00    0.00    0.00   90.93
10:54:42 PM    0   24.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   74.00
10:54:42 PM    1   22.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   76.00
10:54:42 PM    2    2.02    1.01    0.00    0.00    0.00    0.00    0.00    0.00   96.97
10:54:42 PM    3    2.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.00
10:54:42 PM    4   14.15    0.00    1.89    0.00    0.00    0.00    0.00    0.00   83.96
10:54:42 PM    5    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
10:54:42 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:54:42 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00





No comments:

Post a Comment

Translate >>