Gouranga's Tech Blog: Analyze 'vmstat' , 'iostat' , 'svmon' & 'nmon' report in AIX (IBM)

-- Basic commands on Aix for CPU and Memory (Tested in AIX 6.1)

(1) Find no. of CPUs/ Processor with example

$ lsconf | grep Processor

output:

Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 7
Processor Version: PV_7_Compat
Number Of Processors: 5
Processor Clock Speed: 3550 MHz
Model Implementation: Multiple Processor, PCI bus
+ proc0 Processor
+ proc4 Processor
+ proc8 Processor
+ proc12 Processor
+ proc16 Processor

$ lsdev -C|grep Processor

bash-3.2$ lsdev -C|grep Processor
proc0 Available 00-00 Processor
proc4 Available 00-04 Processor
proc8 Available 00-08 Processor
proc12 Available 00-12 Processor
proc16 Available 00-16 Processor

(2) Find Total RAM memory with example

$ lsconf | grep Memory

output:
Memory Size: 40960 MB
Good Memory Size: 40960 MB

(3) To Find 5 seconds interval CPU utilization

bash-3.2$ iostat -tT 5

System configuration: lcpu=20

tty: tin tout avg-cpu: % user % sys % idle % iowait time
0.0 608.6 47.6 5.1 43.0 4.3 11:09:52
0.0 802.4 38.9 6.4 51.2 3.5 11:09:57
0.0 611.4 42.5 5.8 47.8 3.9 11:10:02
(4) Swap Memory set and Usage ( In AIX)
$ lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type Chksum
paging01 hdisk15 oraclebkpvg 17408MB 1 yes yes lv 0
hd6 hdisk0 rootvg 15872MB 1 yes yes lv 0

$ lsps -s
Total Paging Space Percent Used
33280MB 1%

To Analyze:
1) Total CPUs 20 ( logical)
2) %user --> Used / consumed by user requests. ( can be varry depend on load)
3) %sys --> Used / consumed by server services ( should be minimum, i.e., within single digit)
4) %idle --> Free cpu uti,ization %. ( more value good performance)
5) % iowait --> Hard disk is being used ( less value will be good but can varry)
6) time --> collection time

(5) To Find 5 seconds interval Memory utilization

$ vmstat 5

System configuration: lcpu=20 mem=40960MB

kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
5 0 7647013 2116903 0 0 0 0 0 0 5901 99501 20126 44 7 43 6
2 0 7648579 2115333 0 0 0 0 0 0 6004 82808 20986 36 6 52 6
9 0 7710614 2047010 0 0 0 0 0 0 6959 101767 21211 69 6 21 4
16 0 7715835 2041782 0 0 0 0 0 0 6026 131635 19902 84 5 8 3

( Ctrl+C can be used to cancel)

See another report:

(6) Analyze 'vmstat' report.

From the above report:
a) CPU
Note : Total no. of logical CPUs are 20.
-- kthr: (kernel thread) states.
r -> Average number of runnable kernel threads over the sampling interval.
b -> Average number of kernel threads placed in the VMM wait queue (awaiting resource, awaiting input/output) over the sampling interval
Note: From 'r' column, it is found that max 16 CPUs are utilized out of 20 CPU.

b) Explaining more:

r: The number of processes waiting for run time or placed in run queue or are already executing (running)

b: The number of processes in uninterruptible sleep. (b=blocked queue, waiting for resource (e.g. filesystem I/O blocked, inode lock))

If runnable threads (r) divided by the number of CPU is greater than one -> possible CPU bottleneck

(The (r) coulmn should be compared with number of CPUs (logical CPUs as in uptime) if we have enough CPUs or we have more threads.)

High numbers in the blocked processes column (b) indicates slow disks.

(r) should always be higher than (b); if it is not, it usually means you have a CPU bottleneck

c) Memory:
Note: Total Memory ( RAM) : 40GB
information about the usage of virtual and real memory. Virtual pages are considered active if they have been accessed. A page is 4096 bytes.
-- avm -> Active virtual pages.
-- fre -> Size of the free list.
Note: A large portion of real memory is utilized as a cache for file system data. It is not unusual for the size of the free list to remain small.

-- Page -> information about page faults and paging activity. These are averaged over the interval and given in units per second.
re -> Pager input/output list.
pi -> Pages paged in from paging space.
po -> Pages paged out to paging space.
fr -> Pages freed (page replacement).
sr -> Pages scanned by page-replacement algorithm.
cy -> Clock cycles by page-replacement algorithm.

-- Faults: trap and interrupt rate averages per second over the sampling interval.
in Device interrupts.
sy -> System calls.
cs -> Kernel thread context switches.

-- Cpu: breakdown of percentage usage of processor time.
us -> User time.
sy -> System time.
id -> Processor idle time.
wa -> Processor idle time during which the system had outstanding disk/NFS I/O request(s). See detailed description above.
pc -> Number of physical processors consumed. Displayed only if the partition is running with shared processor.
ec -> The percentage of entitled capacity consumed. Displayed only if the partition is running with shared processor. Because the time base over which this data is computed can vary, the entitled capacity percentage can sometimes exceed 100%. This excess is noticeable only with small sampling intervals.

To Analyze:

The rule for identifying a server with CPU resource problems is quite simple. Whenever the value of the runqueue r column exceeds the number of CPUs on the server, tasks are forced to wait for execution. There are several solutions to managing CPU overload, and these alternatives are presented in their order of desirability:

1. Add more processors (CPUs) to the server.

2. Load balance the system tasks by rescheduling large batch tasks to execute during off-peak hours.

3. Adjust the dispatching priorities (nice values) of existing tasks.

To understand how dispatching priorities work, we must remember that incoming tasks are placed in the execution queue according to their nice value (see vmstat output). Here we see that tasks with a low nice value are scheduled for execution above those tasks with a higher nice value.

Note : In OLTP database servers, processor sholud be dedicated for best performance. If 'ec' values are not comming, then your server is configured with dedicated.

Sure it will help
for detauls Click here to view from IBM sites.

7) svmon :

svmon is the most comprehensive tool to find which process consuming how much memory on AIX. You can also get a very nice summary with memory shown in MB using,

$ svmon -P -O summary=basic,unit=MB
Unit: MB

-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
15794288 oracle 20171.50 101.00 0 20061.84
15204446 oracle 20171.49 101.00 0 20061.83
15269984 oracle 20171.49 101.00 0 20061.83
11665450 oracle 20150.82 101.00 0 20041.14
15466606 oracle 20146.87 101.00 0 20037.21
10551506 oracle 20145.63 101.00 0 20035.97
10616950 oracle 20143.97 101.00 0 20034.31
44761122 oracle 20140.69 101.00 0 20037.83
11075672 oracle 20140.68 101.00 0 20031.02
6422750 oracle 20140.27 101.00 0 20030.61
11862158 oracle 20140.24 101.00 0 20030.58
14286928 oracle 20136.61 101.00 0 20026.95
6094986 oracle 20134.95 101.00 0 20025.29
12058794 oracle 20134.84 101.00 0 20025.16
......

-- To find the specific process consumed memory details:

svmon -P <pid>

e.g., svmon -P 15794288

-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
15794288 oracle 5163903 25840 0 5135830 Y N N

PageSize Inuse Pin Pgsp Virtual
s 4 KB 46591 0 0 18518
m 64 KB 319832 1615 0 319832

Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
3415b4 70000018 work default shmat/mmap m 4096 0 0 4096
5e15de 7000003f work default shmat/mmap m 4096 0 0 4096
2e15ae 70000014 work default shmat/mmap m 4096 0 0 4096
5d15dd 70000041 work default shmat/mmap m 4096 0 0 4096
5c15dc 70000016 work default shmat/mmap m 4096 0 0 4096
5b15db 70000015 work default shmat/mmap m 4096 0 0 4096
5a15da 7000003e work default shmat/mmap m 4096 0 0 4096
5915d9 7000003d work default shmat/mmap m 4096 0 0 4096
5815d8 7000003c work default shmat/mmap m 4096 0 0 4096
5715d7 7000003b work default shmat/mmap m 4096 0 0 4096
5615d6 7000003a work default shmat/mmap m 4096 0 0 4096
5515d5 70000039 work default shmat/mmap m 4096 0 0 4096
5415d4 70000038 work default shmat/mmap m 4096 0 0 4096

.....
.....
will get you the full and glorious output. Obviously man svmon helps with interpreting that (just remember, by default, nearly all of the numbers are page counts, which are usually 4KB in size).

-- For a summary of the top 15 processes using memory on the system, use the following command:

$ svmon -Pt15 | perl -e 'while(<>){print if($.==2||$&&&!$s++);$.=0 if(/^-+$/)}'
output:

Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
15794288 oracle 5163903 25872 0 5135830 Y N N
15204446 oracle 5163902 25872 0 5135829 Y N N
15269984 oracle 5163902 25872 0 5135829 Y N N
22740992 oracle 5162225 25872 0 5135893 Y N N
11665450 oracle 5158609 25872 0 5130531 Y N N
15466606 oracle 5157599 25872 0 5129526 Y N N
10551506 oracle 5157282 25872 0 5129209 Y N N
10616950 oracle 5156857 25872 0 5128784 Y N N
11075672 oracle 5156013 25872 0 5127940 Y N N
6422750 oracle 5155908 25872 0 5127835 Y N N
11862158 oracle 5155901 25872 0 5127828 Y N N
14286928 oracle 5154972 25872 0 5126899 Y N N
6094986 oracle 5154546 25872 0 5126473 Y N N
12058794 oracle 5154519 25872 0 5126440 Y N N
10354772 oracle 5154518 25872 0 5126445 Y N N

The Pid 15794288 is the process ID that has the highest memory consumption. The Command indicates the command name, in this case 'oracle'. The Inuse column, which is the total number of pages in real memory from segments that are used by the process, shows 5163903 pages. Each page is 4 KB. The Pin column, which is the total number of pages pinned from segments that are used by the process, shows 25872 pages. The Pgsp column, which is the total number of paging-space pages that are used by the process, shows 0 pages. The Virtual column (total number of pages in the process virtual space) shows 5135830.

The detailed section displays information about each segment for each process that is shown in the summary section. This includes the virtual, Vsid, and effective, Esid, segment identifiers. The Esid reflects the segment register that is used to access the corresponding pages. The type of the segment is also displayed along with its description that consists in a textual description of the segment, including the volume name and i-node of the file for persistent segments. The report also details the size of the pages the segment is backed by, where s denotes 4 KB pages and L denotes 16 MB pages, the number of pages in RAM, Inuse, number of pinned pages in RAM ,Pin, number of pages in paging space, Pgsp, and number of virtual pages, Virtual.

7) nmon :

Common question: How I see XX% of Used Memory?

nmon then hit "m" will quickly show you a few big uses of memory

The shared memory used by lots of applications like DB2 and Oracle - check the SEGSZ for the size.
The Owner column usually tells you what it is used for like the oracle user for the SGA or db2inst1 for the DB2 buffer cache.

If you use nmon then "t" for top processes and then "4" to order in process size you see the process memory
Size KB = the size as found in the program file on disk.
Resident Set Size = how big it is in memory (excluding the pages still in the file system (like code) and some parts on paging disks)".
ResText column is the code pages of the Resident Set
ResData column is the data and stack pages of the Resident Set

If you want the full details then you need the svmon command but be warned - this tells you the full picture and it is very complicated. Even when listing the details of a single process - as you find there are a dozen combinations of memory attributes and you find you processes memory has all of them and for the dozen libraries.

8) Understanding %CPU while running top command

Most of Oracle DBAs or Linux administrators have following common questions on "TOP" command.

What does %CPU means when "top" is running?
If %CPU for my application as 400 or 500 most of the times, then how I will interpret? What number is a high number?

To answer this, you need to collect some statistics like Total CPU and cores available etc.

e.g.,

$ lscpu

gives me below output:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2599.928
BogoMIPS: 5199.94
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

so, we can understad like belwo:

%CPU -- CPU Usage : The percentage of your CPU that is being used by the process. By default, top displays this as a percentage of a single CPU. On multi-core systems,

you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, top will show a CPU use of 180%. See here for more information. You can

toggle this behavior by hitting "Shift+i" while top is running to show the overall percentage of available CPUs in use.

According to your lscpu output:

You have 32 cores (CPU(s)) in total.
You have 2 physical sockets (Socket(s)), each contains 1 physical processor.
Each processor of yours has 8 physical cores (Core(s) per socket) inside, which means you have 8 * 2 = 16 real cores.
Each real core can have 2 threads (Thread(s) per core), which means you have real cores * threads = 16 * 2 = 32 cores in total.
So you have 16 virtual cores and 16 real cores.

It means you have a lot of cores. One core at max is 100%. So the highest it can be is number_of_cores × 100%.

If you want to see core-wise CPU statistics, then below command can be used:

$ mpstat -P ALL 1

It shows how much each core is busy and it updates automatically each second. The output would be something like this
(on a quad-core processor):

10:54:41 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:54:42 PM all 8.20 0.12 0.75 0.00 0.00 0.00 0.00 0.00 90.93
10:54:42 PM 0 24.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 74.00
10:54:42 PM 1 22.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 76.00
10:54:42 PM 2 2.02 1.01 0.00 0.00 0.00 0.00 0.00 0.00 96.97
10:54:42 PM 3 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.00
10:54:42 PM 4 14.15 0.00 1.89 0.00 0.00 0.00 0.00 0.00 83.96
10:54:42 PM 5 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
10:54:42 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
10:54:42 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

Gouranga's Tech Blog

Pages

Jan 2, 2014

Analyze 'vmstat' , 'iostat' , 'svmon' & 'nmon' report in AIX (IBM)

No comments:

Post a Comment

Translate >>

Pages