(Up-to-date source of this post.)

Taking stock of hardware

Sources of hardware information:

lscpu
/proc/cpuinfo       # one entry for each core seen by the OS
free -m
/proc/meminfo
fdisk -l
/proc/diskstats

Desktop Management Interface (DMI, aka SMBIOS):

dmidecode -t <type>    # see "DMI TYPES" in manpage

Network:

ifconfig -a

CPU

Overall utilization

Is CPU the bottleneck?

$ vmstat 5 5 --unit M
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0    230    687  44366    0    0  2923  3037    1    0  4  3 85  7
 0  0      0    218    687  44380    0    0 76160    10 2814 4233  3  1 96  0
 0  0      0    224    687  44377    0    0 79462     0 3253 5979  3  2 95  0
 0  0      0    230    687  44374    0    0 82432    18 3069 5674  3  1 95  0
 1  0      0    233    687  44372    0    0 86400    18 3705 5215  3  2 95  0

first line reports averages since system's boot (the entire uptime), subsequent lines are averages within the previous sample period (default is 5 seconds)
r - runnable processes
b - processes blocked for I/O
in - interrupts
cs - context switches (number of times the kernel switches into kernel code; i.e. changing which process is running)
us - user time (the percentage of time the CPU is spending on user tasks)
sy - system (kernel) time
id - idle time
wa - waiting for I/O

On multiprocessor machines, most tools present an average of processor statistics across all processors.

High us numbers generally indicate computation, high sy numbers mean that processes are doing lot of syscalls or I/O. A rule of thumb for a general server is that the system should spend 50% in user space and 50% in system space; the overall idle time should be 0.

Extremely high cs or in values typically indicate a misbehaving or misconfigured hardware device.

Load average

How many pieces is the CPU divided into?

Average number of runnable (ready to run) processes:

$ uptime 
 13:03:23 up 8 days, 13:06,  2 users,  load average: 1.13, 1.31, 1.38

5, 10, and 15-minute averages
process waiting for input (e.g. from keyboard, network) are not considered ready to run (only processes that are actually doing something contribute to load average)
on a single-processor system -- 3 usually means busy, > 8 means problem (you should start to look for ways to spread the load artificially, such as by using nice to set process priorities)
on a multi-core system -- if number of cores = load average, all cores have just enough to do all the time

The system load average is an excellent metric to track as part of a system baseline. If you know your system’s load average on a normal day and it is in that same range on a bad day, this is a hint that you should look elsewhere (such as the network) for performance problems. A load average above the expected norm suggests that you should look at the processes running on the system itself.

Per process consumption

Which processes are hogging resources?

Snapshot of current processes:

$ ps aux

m - show threads

Processes and other system information regularly updated:

$ top

z, x - turn on colors and highlight sort column
Spacebar - update display immediately
M - sort by current resident memory usage
T - sort by total (cumulative) CPU usage
H - toggle threads/processes display
u - display only one user's processes
f - select statistics to display

On a busy system, at least 70% of the CPU is often consumed by just one or two processes. Deferring the execution of the CPU hogs or reducing their priority makes the CPU more available to other processes.

How much CPU time a process uses:

$ time ls    # or /usr/bin/time

user time - time the CPU spent running the program's own code
system time - time the kernel spends doing the process's work (ex. reading files or directories)
real/elapsed time - total time it took to run the process, including the time the CPU spent running other tasks

Threads

Some processes can be divided into pieces called threads:

very similar to processes: have TID, are scheduled and run by the kernel
processes don't share system resources
all threads inside a single process share system resources (I/O connections, memory)

Many processes have only one thread - single-threaded processes (usually called just processes).

All processes start out single-threaded. This starting thread is called main thread. The main thread then starts new threads in similar fashion a process calls fork() to start a new process.

Threads are useful when process has a lot to do because threads can run simultaneously on multiple processors and start faster than processes and intercommunicate more efficiently (via shared memory) than processes (via network connection or pipe).

It's usually not a good idea to interact with individual threads as you would with processes.

Memory

Storage I/O

$ iostat 5 5
Linux 3.2.0-4-amd64 (backup2)   06/14/2015  _x86_64_    (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.80    0.34    3.17    7.49    0.00   85.20

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sdb              49.61      1852.45       349.64 1369392967  258461851
sdc             301.74     21510.91     24545.93 15901546498 18145130448
sdd              75.02      6184.17      6195.25 4571531985 4579724644
sda             307.37     16906.94     17127.65 12498149921 12661307662
dm-0            131.14      8082.58      9533.25 5974897325 7047285056
dm-1            172.96     13428.25     15012.67 9926593437 11097845392
dm-2            107.96      1612.16       347.05 1191762057  256547336

the first report provides statistics since the system was booted, subsequent reports cover the time since the previous report
tps - total I/O transfers per second
kB_read/s - average number of kilobytes read per second
kB_read - total kilobytes read

Processes using file or directory on /usr filesystem (mount point):

$ fuser -cv /usr
                     USER        PID ACCESS COMMAND
/usr:                root     kernel mount /
                     root          1 .rce. init
                     root          2 .rc.. kthreadd

.. ACCESS: * f,o - the process has a file open for reading or writing * c - the process's current directory is on the filesystenm * e, t - the process is currently executing a file * r - the process's root directory (set with chroot) in on the filesystem * m, s - the process has mapped a file or shared library

List open files:

$ lsof    # pipe output to pager or use options

Network I/O

To see info on network connections:

# netstat -tulanp

-t - print TCP ports info
-u - print UDP ports info
-l - print listening ports
-a - print all active ports
-n - don't reverse-resolve IP addresses
-p - print name and PID of the program owning the socket

To list all programs using or listening to ports (when run as regular user, only shows user's processes):

# lsof -ni -P

-n - don't reverse-resolve IP addresses
-P - disable /etc/services port name lookups

To list Unix domain sockets (not to be confused with network sockets although similar) currently in use on your system:

# lsof -U    # unnamed sockets have "socket" in NAME column

lsof network connections filtering

by protocol, host and port:

lsof -i[<protocol>@<host>]:<port>

.. ex.

lsof -i:22
lsof -iTCP:80

by connection status:

lsof -iTCP -sTCP:LISTEN

Resources

ULSAH, 4th, Ch. 29
How Linux Works, 2nd, Ch. 8
Corresponding man pages

Jozef Reisinger's blog

2015-12-03

Linux Performance Analysis