Jan 2, 2014

Analyze 'vmstat' , 'iostat' , 'svmon' & 'nmon' report in AIX (IBM)

-- Basic commands on Aix for CPU and Memory (Tested in AIX 6.1)

(1) Find no. of CPUs/ Processor with example

$ lsconf | grep Processor

output:

Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 7
Processor Version: PV_7_Compat
Number Of Processors: 5
Processor Clock Speed: 3550 MHz
  Model Implementation: Multiple Processor, PCI bus
+ proc0                                                                           Processor
+ proc4                                                                           Processor
+ proc8                                                                           Processor
+ proc12                                                                          Processor
+ proc16                                                                          Processor

$ lsdev -C|grep Processor

bash-3.2$ lsdev -C|grep Processor
proc0      Available 00-00       Processor
proc4      Available 00-04       Processor
proc8      Available 00-08       Processor
proc12     Available 00-12       Processor
proc16     Available 00-16       Processor

(2) Find Total RAM memory with example

$ lsconf | grep Memory

output:
Memory Size: 40960 MB
Good Memory Size: 40960 MB

(3) To Find 5 seconds interval CPU utilization

bash-3.2$ iostat -tT 5

System configuration: lcpu=20

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait  time
          0.0        608.6               47.6   5.1   43.0      4.3  11:09:52
          0.0        802.4               38.9   6.4   51.2      3.5  11:09:57
          0.0        611.4               42.5   5.8   47.8      3.9  11:10:02
(4) Swap Memory set and Usage ( In AIX)
$ lsps -a
Page Space      Physical Volume   Volume Group Size %Used Active  Auto  Type Chksum
paging01        hdisk15           oraclebkpvg  17408MB     1   yes   yes    lv     0
hd6             hdisk0            rootvg       15872MB     1   yes   yes    lv     0

$ lsps -s
Total Paging Space   Percent Used
      33280MB               1%

To Analyze:
1) Total CPUs 20 ( logical)
2) %user --> Used / consumed by user requests. ( can be varry depend on load)
3) %sys  --> Used / consumed by server services ( should be minimum, i.e., within single digit)
4) %idle --> Free cpu uti,ization %. ( more value good performance)
5) % iowait --> Hard disk is being used ( less value will be good but can varry)
6) time --> collection time

(5) To Find 5 seconds interval Memory utilization

$ vmstat 5

System configuration: lcpu=20 mem=40960MB

kthr    memory              page              faults        cpu  
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 5  0 7647013 2116903   0   0   0   0    0   0 5901 99501 20126 44  7 43  6
 2  0 7648579 2115333   0   0   0   0    0   0 6004 82808 20986 36  6 52  6
 9  0 7710614 2047010   0   0   0   0    0   0 6959 101767 21211 69  6 21  4
16  0 7715835 2041782   0   0   0   0    0   0 6026 131635 19902 84  5  8  3

( Ctrl+C can be used to cancel)

See another report:








(6) Analyze 'vmstat' report.

From the above report:
a) CPU
Note : Total no. of logical CPUs are 20.
-- kthr: (kernel thread) states.
r ->  Average number of runnable kernel threads over the sampling interval.
b -> Average number of kernel threads placed in the VMM wait queue (awaiting resource, awaiting input/output) over the sampling interval
Note: From 'r' column, it is found that max 16 CPUs are utilized out of 20 CPU.

b) Explaining more:

 r: The number of processes waiting for run time or placed in run queue or are already executing (running)
       b: The number of processes in uninterruptible sleep. (b=blocked queue, waiting for resource (e.g. filesystem I/O blocked, inode lock))

If runnable threads (r) divided by the number of CPU is greater than one -> possible CPU bottleneck

(The (r) coulmn should be compared with number of CPUs (logical CPUs as in uptime) if we have enough CPUs or we have more threads.)

High numbers in the blocked processes column (b) indicates slow disks.

(r) should always be higher than (b); if it is not, it usually means you have a CPU bottleneck

c) Memory:
Note: Total Memory ( RAM) : 40GB
information about the usage of virtual and real memory. Virtual pages are considered active if they have been accessed. A page is 4096 bytes.
-- avm -> Active virtual pages.
-- fre ->  Size of the free list.
Note: A large portion of real memory is utilized as a cache for file system data. It is not unusual for the size of the free list to remain small.

-- Page -> information about page faults and paging activity. These are averaged over the interval and given in units per second.
re -> Pager input/output list.
pi -> Pages paged in from paging space.
po -> Pages paged out to paging space.
fr -> Pages freed (page replacement).
sr -> Pages scanned by page-replacement algorithm.
cy -> Clock cycles by page-replacement algorithm.

-- Faults: trap and interrupt rate averages per second over the sampling interval.
in Device interrupts.
sy -> System calls.
cs -> Kernel thread context switches.

-- Cpu: breakdown of percentage usage of processor time.
us -> User time.
sy -> System time.
id -> Processor idle time.
wa -> Processor idle time during which the system had outstanding disk/NFS I/O request(s). See detailed description above.
pc -> Number of physical processors consumed. Displayed only if the partition is running with shared processor.
ec -> The percentage of entitled capacity consumed. Displayed only if the partition is running with shared processor. Because the time base over which this data is computed can vary, the entitled capacity percentage can sometimes exceed 100%. This excess is noticeable only with small sampling intervals.



To Analyze:

The rule for identifying a server with CPU resource problems is quite simple. Whenever the value of the runqueue r column exceeds the number of CPUs on the server, tasks are forced to wait for execution. There are several solutions to managing CPU overload, and these alternatives are presented in their order of desirability:

1.      Add more processors (CPUs) to the server.

2.      Load balance the system tasks by rescheduling large batch tasks to execute  during off-peak hours.

3.      Adjust the dispatching priorities (nice values) of existing tasks.

To understand how dispatching priorities work, we must remember that incoming tasks are placed in the execution queue according to their nice value (see vmstat output). Here we see that tasks with a low nice value are scheduled for execution above those tasks with a higher nice value.

Note : In OLTP database servers, processor sholud be dedicated for best performance. If 'ec' values are not comming, then your server is configured with dedicated.

Sure it will help
for detauls Click here to view from IBM sites.

7) svmon :

svmon is the most comprehensive tool to find which process consuming how much memory on AIX. You can also get a very nice summary with memory shown in MB using,

$ svmon -P -O summary=basic,unit=MB
Unit: MB

-------------------------------------------------------------------------------
     Pid Command          Inuse      Pin     Pgsp  Virtual
15794288 oracle        20171.50   101.00        0 20061.84
15204446 oracle        20171.49   101.00        0 20061.83
15269984 oracle        20171.49   101.00        0 20061.83
11665450 oracle        20150.82   101.00        0 20041.14
15466606 oracle        20146.87   101.00        0 20037.21
10551506 oracle        20145.63   101.00        0 20035.97
10616950 oracle        20143.97   101.00        0 20034.31
44761122 oracle        20140.69   101.00        0 20037.83
11075672 oracle        20140.68   101.00        0 20031.02
 6422750 oracle        20140.27   101.00        0 20030.61
11862158 oracle        20140.24   101.00        0 20030.58
14286928 oracle        20136.61   101.00        0 20026.95
 6094986 oracle        20134.95   101.00        0 20025.29
12058794 oracle        20134.84   101.00        0 20025.16
......

-- To find the specific process consumed memory details:

svmon -P <pid>

e.g.,  svmon -P 15794288

-------------------------------------------------------------------------------
     Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
15794288 oracle         5163903    25840        0  5135830      Y     N     N

     PageSize                Inuse        Pin       Pgsp    Virtual
     s    4 KB               46591          0          0      18518
     m   64 KB              319832       1615          0     319832

    Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
  3415b4  70000018 work default shmat/mmap           m   4096     0    0    4096
  5e15de  7000003f work default shmat/mmap           m   4096     0    0    4096
  2e15ae  70000014 work default shmat/mmap           m   4096     0    0    4096
  5d15dd  70000041 work default shmat/mmap           m   4096     0    0    4096
  5c15dc  70000016 work default shmat/mmap           m   4096     0    0    4096
  5b15db  70000015 work default shmat/mmap           m   4096     0    0    4096
  5a15da  7000003e work default shmat/mmap           m   4096     0    0    4096
  5915d9  7000003d work default shmat/mmap           m   4096     0    0    4096
  5815d8  7000003c work default shmat/mmap           m   4096     0    0    4096
  5715d7  7000003b work default shmat/mmap           m   4096     0    0    4096
  5615d6  7000003a work default shmat/mmap           m   4096     0    0    4096
  5515d5  70000039 work default shmat/mmap           m   4096     0    0    4096
  5415d4  70000038 work default shmat/mmap           m   4096     0    0    4096

.....
.....
will get you the full and glorious output. Obviously man svmon helps with interpreting that (just remember, by default, nearly all of the numbers are page counts, which are usually 4KB in size).

-- For a summary of the top 15 processes using memory on the system, use the following command:

$ svmon -Pt15 | perl -e 'while(<>){print if($.==2||$&&&!$s++);$.=0 if(/^-+$/)}'
output:

 Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
15794288 oracle         5163903    25872        0  5135830      Y     N     N
15204446 oracle         5163902    25872        0  5135829      Y     N     N
15269984 oracle         5163902    25872        0  5135829      Y     N     N
22740992 oracle         5162225    25872        0  5135893      Y     N     N
11665450 oracle         5158609    25872        0  5130531      Y     N     N
15466606 oracle         5157599    25872        0  5129526      Y     N     N
10551506 oracle         5157282    25872        0  5129209      Y     N     N
10616950 oracle         5156857    25872        0  5128784      Y     N     N
11075672 oracle         5156013    25872        0  5127940      Y     N     N
 6422750 oracle         5155908    25872        0  5127835      Y     N     N
11862158 oracle         5155901    25872        0  5127828      Y     N     N
14286928 oracle         5154972    25872        0  5126899      Y     N     N
 6094986 oracle         5154546    25872        0  5126473      Y     N     N
12058794 oracle         5154519    25872        0  5126440      Y     N     N
10354772 oracle         5154518    25872        0  5126445      Y     N     N

The Pid 15794288 is the process ID that has the highest memory consumption. The Command indicates the command name, in this case 'oracle'. The Inuse column, which is the total number of pages in real memory from segments that are used by the process, shows 5163903 pages. Each page is 4 KB. The Pin column, which is the total number of pages pinned from segments that are used by the process, shows 25872 pages. The Pgsp column, which is the total number of paging-space pages that are used by the process, shows 0 pages. The Virtual column (total number of pages in the process virtual space) shows 5135830.

The detailed section displays information about each segment for each process that is shown in the summary section. This includes the virtual, Vsid, and effective, Esid, segment identifiers. The Esid reflects the segment register that is used to access the corresponding pages. The type of the segment is also displayed along with its description that consists in a textual description of the segment, including the volume name and i-node of the file for persistent segments. The report also details the size of the pages the segment is backed by, where s denotes 4 KB pages and L denotes 16 MB pages, the number of pages in RAM, Inuse, number of pinned pages in RAM ,Pin, number of pages in paging space, Pgsp, and number of virtual pages, Virtual.

7) nmon :

Common question: How I see XX% of Used Memory?

nmon then hit "m" will quickly show you a few big uses of memory



The shared memory used by lots of applications like DB2 and Oracle - check the SEGSZ for the size.
The Owner column usually tells you what it is used for like the oracle user for the SGA or db2inst1 for the DB2 buffer cache.

If you use nmon then "t" for top processes and then "4" to order in process size you see the process memory
Size KB = the size as found in the program file on disk.
Resident Set Size = how big it is in memory (excluding the pages still in the file system (like code) and some parts on paging disks)".
ResText column is the code pages of the Resident Set
ResData column is the data and stack pages of the Resident Set


If you want the full details then you need the svmon command but be warned - this tells you the full picture and it is very complicated.  Even when listing the details of a single process - as you find there are a dozen combinations of memory attributes and you find you processes memory has all of them and for the dozen libraries.

8) Understanding %CPU while running top command

Most of Oracle DBAs or Linux administrators have following common questions on "TOP" command.

What does %CPU means when "top" is running?
If %CPU for my application as 400 or 500 most of the times, then how I will interpret? What number is a high number?

To answer this, you need to collect some statistics like Total CPU and cores available etc.

e.g.,

$ lscpu 

gives me below output:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Stepping:              7
CPU MHz:               2599.928
BogoMIPS:              5199.94
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

so, we can understad like belwo:

%CPU -- CPU Usage : The percentage of your CPU that is being used by the process. By default, top displays this as a percentage of a single CPU. On multi-core systems,

you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use,  top will show a CPU use of 180%. See here for more information. You can

toggle this behavior by hitting "Shift+i" while top is running to show the overall percentage of available CPUs in use.

According to your lscpu output:

You have 32 cores (CPU(s)) in total.
You have 2 physical sockets (Socket(s)), each contains 1 physical processor.
Each processor of yours has 8 physical cores (Core(s) per socket) inside, which means you have 8 * 2 = 16 real cores.
Each real core can have 2 threads (Thread(s) per core), which means you have real cores * threads = 16 * 2 = 32 cores in total.
So you have 16 virtual cores and 16 real cores.

It means you have a lot of cores. One core at max is 100%. So the highest it can be is number_of_cores × 100%.

If you want to see core-wise CPU statistics, then below command can be used:

$ mpstat -P ALL 1

It shows how much each core is busy and it updates automatically each second. The output would be something like this
(on a quad-core processor):

10:54:41 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:54:42 PM  all    8.20    0.12    0.75    0.00    0.00    0.00    0.00    0.00   90.93
10:54:42 PM    0   24.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   74.00
10:54:42 PM    1   22.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   76.00
10:54:42 PM    2    2.02    1.01    0.00    0.00    0.00    0.00    0.00    0.00   96.97
10:54:42 PM    3    2.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.00
10:54:42 PM    4   14.15    0.00    1.89    0.00    0.00    0.00    0.00    0.00   83.96
10:54:42 PM    5    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
10:54:42 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:54:42 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00


No comments:

Post a Comment

Translate >>