Monday, December 7, 2015

Linux : Performance monitoring tool

Collect Statistics

* Sysstat package : mpstat, vmstat, pidstat, iostat
* dstat, nicstat
* sar , atop
* Percona Cacti template

Watch online
* top
* iotop
* iftop

Tracing
* perf_events
* ftrace
* perf-toold
* Flamegraph
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html


* uptime
If load average > # of CPUs, may mean CPU saturation

* top or htop
Can miss short-lived processes (atop won't)
can consume noticeable CPU to read /proc

* ps

* vmstat
cpu section , us means user or application, sy means kernel / system call usage

* iostat
ex. iostat -xmdz 1 or iostat -x 1
very useful set of stats

* mpstat
ex mpstat -P ALL 1 (look for unbalanced workloads, hot CPUs)

* free
main memory usage
ex. free -m
buffers: block device I/O cache
cached: virtual page cache

* sar
ex. sar -n DEV 1 (network device)

Observability Tools : Intermediate
* strace (system call tracer)
ex. strace -tttT -p (#PID)
-ttt : time (us) since epoch
-T: syscall time(s)
Translate syscall args, very helpful for solving system usage issues
Currently has massive overhead (ptrace based)
Can slow the target by > 100x , so use with caution
ex. strike -Tttvfs 1024 -o /tmp/strace.out <command>

* tcpdump
Sniff network packets for post analysis
Study packet sequences with timestamps (us)
CPU overhead optimized (socket ring buffers), but can be significant. Use caution.
ex: tcpdump -s 0 -n -i ethX -w /tmp/tcpdump.pcap &

* netstat
-s : various network protocol statistics
-i : interface stats
-r : route table
-p : shows process details
-c : per second interval

* pidstat
very useful process stats. by thread, disk I/O
ex. pidstat -t 1
ex. pidstat -d 1

* swapon
show swap device usage: swapon -s

* lsof
more a debug tool, lsof shows file descriptor usage
ex. lsof -iTCP -sTCP:ESTABLISHED

* sar (and collectl, dstat, etc. )
Archive or live mode (interval [count])
logical groups: TCP, ETCP, DEV, EDEV
ex. sar -n TCP,ETCP,DEV 1

Report network statistics (sar -n)

This reports various network statistics. For example: number of packets received (transmitted) through the network card, statistics of packet failure etc.,. “1 3″ reports for every 1 seconds a total of 3 times.
sar -n KEYWORD
KEYWORD can be one of the following:
  • DEV – Displays network devices vital statistics for eth0, eth1, etc.,
  • EDEV – Display network device failure statistics
  • NFS – Displays NFS client activities
  • NFSD – Displays NFS server activities
  • SOCK – Displays sockets in use for IPv4
  • IP – Displays IPv4 network traffic
  • EIP – Displays IPv4 network errors
  • ICMP – Displays ICMPv4 network traffic
  • EICMP – Displays ICMPv4 network errors
  • TCP – Displays TCPv4 network traffic
  • ETCP – Displays TCPv4 network errors
  • UDP – Displays UDPv4 network traffic
  • SOCK6, IP6, EIP6, ICMP6, UDP6 are for IPv6
  • ALL – This displays all of the above information. The output will be very long.
* strace
trace system call of a running process
ex. strace -tp `pgrep [process name, eg. java] ` 2>&1 | head -100

Advanced Observability tools
Misc: ltrace, ss, iptraf, ethtool, snmpget, lldptool, iotop, blktrace, slabtop, /proc, pcstat
CPU Performance Counters: perf_events, tiptop, rdmsr
Advanced Tracers: perf_events, ftrace, eBPF, SystermTap, ktap, LTTng, dtrace4linux, sysdig

* ss
socket statistics
ex. ss -mop
ex. ss -i

* iptraf
provide histograms of different packet sizes

* iotop
block device I/O disk by process
Need kernel support enabled : CONFIG_TASK_IO_ACCOUNTING

* slabtop
Kernel slab allocator memory usage

* pcstat
Show page cache residency by file
Uses the mincore syscall. Useful for database performance analysis.

* perf_events
Provides the perf command
In Linux source code: tools/perf
Usually pkg added by linux-tools-common
Multi-tool : CPU profiling, PMC profiling, Static & dynamic tracing

* tiptop
IPC by process
Still cannot use it in clouds yet. needs PMCs enabled

* rdmsr
Model Specific Registers (MSRs), unlike PMCs, can be read by default in Xen guests

Benchmarking Tools
Multi: UnixBench, lmbench, sysbench, perf bench
FS/disk: dd, hdparm, fio
App/lib: ab, wrk, jmeter, openssl
Networking: ping, hping3, iperf, ttcp, traceroute, mtr, pchar

Disk benchmark: run iostat...
CPU benchmark: run pidstat, perf, flame graph....

Tuning Tools
Generic interfaces: sysctl, /sys
CPU/scheduler : nice, renice, taskset, ulimit, chcpu
Storage I/O: tune2fs, ionice, hdparm, blockdev
Network: ethtool, tc, ip, route
Dynamic patching: stap, kpatch

Tuning Methods :
Scientific Method:
Question
Hypothesis
Prediction
Test
Analysis

* ip route get [IP address]

ETC
system messages: dmesg
Network interface config: ifconfig -a , ip link
File system capacity: df -h
Volume config : mdadm --misc -D /dev/md0
Storage device info : smartctl
NUMA config: numactl -s , numactl -H
PCI info : lspci
Installed kernel modules: lsmod
Root crontab onfig : crontab -l
Services: service --status-all


Reference :
http://www.thegeekstuff.com/2011/03/sar-examples/
http://www.thegeekstuff.com/2011/03/linux-performance-monitoring-intro/
http://www.brendangregg.com/index.html

No comments:

Post a Comment