vmstat command

The first tool to use is the vmstat command, which quickly provides compact information about various system resources and their related performance problems.
The vmstat command reports statistics about kernel threads in the run and wait queue, memory, paging, disks, interrupts, system calls, context switches, and CPU activity. The reported CPU activity is a percentage breakdown of user mode, system mode, idle time, and waits for disk I/O.
Note: If the vmstat command is used without any options or only with the interval and optionally, the count parameter, such as vmstat 2 10; then the first line of numbers is an average since system reboot.
As a CPU monitor, the vmstat command is superior to the iostat command in that its one-line-per-report output is easier to scan as it scrolls and there is less overhead involved if there are a lot of disks attached to the system. The following example can help you identify situations in which a program has run away or is too CPU-intensive to run in a multiuser environment.

# vmstat 2
kthr memory page faults cpu
—– ———– ———————— ———— ———–
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 0 22478 1677 0 0 0 0 0 0 188 1380 157 57 32 0 10
1 0 22506 1609 0 0 0 0 0 0 214 1476 186 48 37 0 16
0 0 22498 1582 0 0 0 0 0 0 248 1470 226 55 36 0 9

2 0 22534 1465 0 0 0 0 0 0 238 903 239 77 23 0 0
2 0 22534 1445 0 0 0 0 0 0 209 1142 205 72 28 0 0
2 0 22534 1426 0 0 0 0 0 0 189 1220 212 74 26 0 0
3 0 22534 1410 0 0 0 0 0 0 255 1704 268 70 30 0 0
2 1 22557 1365 0 0 0 0 0 0 383 977 216 72 28 0 0

2 0 22541 1356 0 0 0 0 0 0 237 1418 209 63 33 0 4
1 0 22524 1350 0 0 0 0 0 0 241 1348 179 52 32 0 16
1 0 22546 1293 0 0 0 0 0 0 217 1473 180 51 35 0 14

This output shows the effect of introducing a program in a tight loop to a busy multiuser system. The first three reports (the summary has been removed) show the system balanced at 50-55 percent user, 30-35 percent system, and 10-15 percent I/O wait. When the looping program begins, all available CPU cycles are consumed. Because the looping program does no I/O, it can absorb all of the cycles previously unused because of I/O wait. Worse, it represents a process that is always ready to take over the CPU when a useful process relinquishes it. Because the looping program has a priority equal to that of all other foreground processes, it will not necessarily have to give up the CPU when another process becomes dispatchable. The program runs for about 10 seconds (five reports), and then the activity reported by the vmstat command returns to a more normal pattern.

Optimum use would have the CPU working 100 percent of the time. This holds true in the case of a single-user system with no need to share the CPU. Generally, if us + sy time is below 90 percent, a single-user system is not considered CPU constrained. However, if us + sy time on a multiuser system exceeds 80 percent, the processes may spend time waiting in the run queue. Response time and throughput might suffer.

To check if the CPU is the bottleneck, consider the four cpu columns and the two kthr (kernel threads) columns in the vmstat report. It may also be worthwhile looking at the faults column:

* cpu

Percentage breakdown of CPU time usage during the interval. The cpu columns are as follows:
o us

The us column shows the percent of CPU time spent in user mode. A UNIX® process can execute in either user mode or system (kernel) mode. When in user mode, a process executes within its application code and does not require kernel resources to perform computations, manage memory, or set variables.
o sy

The sy column details the percentage of time the CPU was executing a process in system mode. This includes CPU resource consumed by kernel processes (kprocs) and others that need access to kernel resources. If a process needs kernel resources, it must execute a system call and is thereby switched to system mode to make that resource available. For example, reading or writing of a file requires kernel resources to open the file, seek a specific location, and read or write data, unless memory mapped files are used.
o id

The id column shows the percentage of time which the CPU is idle, or waiting, without pending local disk I/O. If there are no threads available for execution (the run queue is empty), the system dispatches a thread called wait, which is also known as the idle kproc. On an SMP system, one wait thread per processor can be dispatched. The report generated by the ps command (with the -k or -g 0 option) identifies this as kproc or wait. If the ps report shows a high aggregate time for this thread, it means there were significant periods of time when no other thread was ready to run or waiting to be executed on the CPU. The system was therefore mostly idle and waiting for new tasks.
o wa

The wa column details the percentage of time the CPU was idle with pending local disk I/O and NFS-mounted disks. If there is at least one outstanding I/O to a disk when wait is running, the time is classified as waiting for I/O. Unless asynchronous I/O is being used by the process, an I/O request to disk causes the calling process to block (or sleep) until the request has been completed. Once an I/O request for a process completes, it is placed on the run queue. If the I/Os were completing faster, more CPU time could be used.

A wa value over 25 percent could indicate that the disk subsystem might not be balanced properly, or it might be the result of a disk-intensive workload.

For information on the change made to wa, see Wait I/O time reporting.
* kthr

Number of kernel threads in various queues averaged per second over the sampling interval. The kthr columns are as follows:
o r

Average number of kernel threads that are runnable, which includes threads that are running and threads that are waiting for the CPU. If this number is greater than the number of CPUs, there is at least one thread waiting for a CPU and the more threads there are waiting for CPUs, the greater the likelihood of a performance impact.
o b

Average number of kernel threads in the VMM wait queue per second. This includes threads that are waiting on filesystem I/O or threads that have been suspended due to memory load control.

If processes are suspended due to memory load control, the blocked column (b) in the vmstat report indicates the increase in the number of threads rather than the run queue.
o p

For vmstat -I The number of threads waiting on I/Os to raw devices per second. Threads waiting on I/Os to filesystems would not be included here.
* faults

Information about process control, such as trap and interrupt rate. The faults columns are as follows:
o in

Number of device interrupts per second observed in the interval. Additional information can be found in Assessing disk performance with the vmstat command.
o sy

The number of system calls per second observed in the interval. Resources are available to user processes through well-defined system calls. These calls instruct the kernel to perform operations for the calling process and exchange data between the kernel and the process. Because workloads and applications vary widely, and different calls perform different functions, it is impossible to define how many system calls per-second are too many. But typically, when the sy column raises over 10000 calls per second on a uniprocessor, further investigations is called for (on an SMP system the number is 10000 calls per second per processor). One reason could be “polling” subroutines like the select() subroutine. For this column, it is advisable to have a baseline measurement that gives a count for a normal sy value.
o cs

Number of context switches per second observed in the interval. The physical CPU resource is subdivided into logical time slices of 10 milliseconds each. Assuming a thread is scheduled for execution, it will run until its time slice expires, until it is preempted, or until it voluntarily gives up control of the CPU. When another thread is given control of the CPU, the context or working environment of the previous thread must be saved and the context of the current thread must be loaded. The operating system has a very efficient context switching procedure, so each switch is inexpensive in terms of resources. Any significant increase in context switches, such as when cs is a lot higher than the disk I/O and network packet rate, should be cause for further investigation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: