Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Last login: Fri Aug 31 09:37:55 on ttys015
- -bash-3.2$ man fts
- Warning: cannot open configuration file /private/etc/man.conf
- No manual entry for fts
- -bash-3.2$ man n fts
- Warning: cannot open configuration file /private/etc/man.conf
- No entry for fts in section n of the manual
- -bash-3.2$ man a fts
- Warning: cannot open configuration file /private/etc/man.conf
- No manual entry for a
- No manual entry for fts
- -bash-3.2$ man
- Warning: cannot open configuration file /private/etc/man.conf
- What manual page do you want?
- -bash-3.2$ bash
- bash-3.2$ ls
- Applications Documents Library Music bin projects sudo
- Desktop Downloads Movies Pictures dev script-dev workspace1
- bash-3.2$ man
- Warning: cannot open configuration file /private/etc/man.conf
- What manual page do you want?
- bash-3.2$ man a fts
- Warning: cannot open configuration file /private/etc/man.conf
- No manual entry for a
- No manual entry for fts
- bash-3.2$ man fts
- Warning: cannot open configuration file /private/etc/man.conf
- No manual entry for fts
- bash-3.2$ man 8 fts
- Warning: cannot open configuration file /private/etc/man.conf
- No entry for fts in section 8 of the manual
- bash-3.2$ manpages n fts
- bash: manpages: command not found
- bash-3.2$ man1 fts
- bash: man1: command not found
- bash-3.2$ cd /usr/share
- bash-3.2$ ls
- CSI emacs icu mecabra snmp vim
- CoreDuetDaemonConfig.bundle examples info misc tabset zoneinfo
- calendar file java php terminfo zoneinfo.default
- com.apple.languageassetd firmware kdrl.bundle pmenergy texinfo zsh
- cracklib germantok kpep ri thermald.bundle
- cups groff langid sandbox tokenizer
- dict hiutil locale screen ucupdate
- doc httpd man skel uucp
- bash-3.2$ cd examples
- bash-3.2$ ls
- DTTk
- bash-3.2$ cd D*
- bash-3.2$ ls
- bitesize_example.txt errinfo_example.txt iopattern_example.txt newproc_example.txt rwbypid_example.txt syscallbypid_example.txt
- cpuwalk_example.txt execsnoop_example.txt iopending_example.txt opensnoop_example.txt rwbytype_example.txt syscallbyproc_example.txt
- creatbyproc_example.txt fddist_example.txt iosnoop_example.txt pathopens_example.txt rwsnoop_example.txt syscallbysysc_example.txt
- dappprof_example.txt filebyproc_example.txt iotop_example.txt pidpersec_example.txt sampleproc_example.txt topsyscall_example.txt
- dapptrace_example.txt hotspot_example.txt kill_example.txt priclass_example.txt seeksize_example.txt topsysproc_example.txt
- dispqlen_example.txt iofile_example.txt lastwords_example.txt pridist_example.txt setuids_example.txt
- dtruss_example.txt iofileb_example.txt loads_example.txt procsystime_example.txt sigdist_example.txt
- bash-3.2$ cat *
- In this example, bitesize.d was run for several seconds then Ctrl-C was hit.
- As bitesize.d runs it records how processes on the system are accessing the
- disks - in particular the size of the I/O operation. It is usually desirable
- for processes to be requesting large I/O operations rather than taking many
- small "bites".
- The final report highlights how processes performed. The find command mostly
- read 1K blocks while the tar command was reading large blocks - both as
- expected.
- # bitesize.d
- Tracing... Hit Ctrl-C to end.
- ^C
- PID CMD
- 7110 -bash\0
- value ------------- Distribution ------------- count
- 512 | 0
- 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 2
- 2048 | 0
- 4096 |@@@@@@@@@@@@@ 1
- 8192 | 0
- 7110 sync\0
- value ------------- Distribution ------------- count
- 512 | 0
- 1024 |@@@@@ 1
- 2048 |@@@@@@@@@@ 2
- 4096 | 0
- 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@ 5
- 16384 | 0
- 0 sched\0
- value ------------- Distribution ------------- count
- 1024 | 0
- 2048 |@@@ 1
- 4096 | 0
- 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
- 16384 | 0
- 7109 find /\0
- value ------------- Distribution ------------- count
- 512 | 0
- 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1452
- 2048 |@@ 91
- 4096 | 33
- 8192 |@@ 97
- 16384 | 0
- 3 fsflush\0
- value ------------- Distribution ------------- count
- 4096 | 0
- 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 449
- 16384 | 0
- 7108 tar cf /dev/null /\0
- value ------------- Distribution ------------- count
- 256 | 0
- 512 | 70
- 1024 |@@@@@@@@@@ 1306
- 2048 |@@@@ 569
- 4096 |@@@@@@@@@ 1286
- 8192 |@@@@@@@@@@ 1403
- 16384 |@ 190
- 32768 |@@@ 396
- 65536 | 0
- The following is a demonstration of the cpuwalk.d script,
- cpuwalk.d is not that useful on a single CPU server,
- # cpuwalk.d
- Sampling... Hit Ctrl-C to end.
- ^C
- PID: 18843 CMD: bash
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 30
- 1 | 0
- PID: 8079 CMD: mozilla-bin
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
- 1 | 0
- The output above shows that PID 18843, "bash", was sampled on CPU 0 a total
- of 30 times (we sample at 1000 hz).
- The following is a demonstration of running cpuwalk.d with a 5 second
- duration. This is on a 4 CPU server running a multithreaded CPU bound
- application called "cputhread",
- # cpuwalk.d 5
- Sampling...
- PID: 3 CMD: fsflush
- value ------------- Distribution ------------- count
- 1 | 0
- 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 30
- 3 | 0
- PID: 12186 CMD: cputhread
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@ 4900
- 1 |@@@@@@@@@@ 4900
- 2 |@@@@@@@@@@ 4860
- 3 |@@@@@@@@@@ 4890
- 4 | 0
- As we are sampling at 1000 hz, the application cputhread is indeed running
- concurrently across all available CPUs. We measured the applicaiton on
- CPU 0 a total of 4900 times, on CPU 1 a total of 4900 times, etc. As there
- are around 5000 samples per CPU available in this 5 second 1000 hz sample,
- the application is using almost all the CPU capacity in this server well.
- The following is a similar demonstration, this time running a multithreaded
- CPU bound application called "cpuserial" that has a poor use of locking
- such that the threads "serialise",
- # cpuwalk.d 5
- Sampling...
- PID: 12194 CMD: cpuserial
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@ 470
- 1 |@@@@@@ 920
- 2 |@@@@@@@@@@@@@@@@@@@@@@@@@ 3840
- 3 |@@@@@@ 850
- 4 | 0
- In the above, we can see that this CPU bound application is not making
- efficient use of the CPU resources available, only reaching 3840 samples
- on CPU 2 out of a potential 5000. This problem was caused by a poor use
- of locks.
- The following is an example of the creatbyproc.d script,
- Here we run creatbyproc.d for several seconds,
- # ./creatbyproc.d
- dtrace: script './creatbyproc.d' matched 2 probes
- CPU ID FUNCTION:NAME
- 0 5438 creat64:entry touch /tmp/newfile
- 0 5438 creat64:entry sh /tmp/mpLaaOik
- 0 5438 creat64:entry sh /dev/null
- ^C
- In another window, the following commands were run,
- touch /tmp/newfile
- man ls
- The file creation activity caused by these commands can be seen in the
- output by creatbyproc.d
- The following is a demonstration of the dappprof command,
- This is the usage for version 0.60,
- # dappprof -h
- USAGE: dappprof [-cehoTU] [-u lib] { -p PID | command }
- -p PID # examine this PID
- -a # print all details
- -c # print syscall counts
- -e # print elapsed times (us)
- -o # print on cpu times
- -T # print totals
- -u lib # trace this library instead
- -U # trace all libraries + user funcs
- -b bufsize # dynamic variable buf size
- eg,
- dappprof df -h # run and examine "df -h"
- dappprof -p 1871 # examine PID 1871
- dappprof -ap 1871 # print all data
- The following shows running dappprof with the "banner hello" command.
- Elapsed and on-cpu times are printed (-eo), as well as counts (-c) and
- totals (-T),
- # dappprof -eocT banner hello
- # # ###### # # ####
- # # # # # # #
- ###### ##### # # # #
- # # # # # # #
- # # # # # # #
- # # ###### ###### ###### ####
- CALL COUNT
- __fsr 1
- main 1
- banprt 1
- banner 1
- banset 1
- convert 5
- banfil 5
- TOTAL: 15
- CALL ELAPSED
- banset 37363
- banfil 147407
- convert 149606
- banprt 423507
- banner 891088
- __fsr 1694349
- TOTAL: 3343320
- CALL CPU
- banset 7532
- convert 8805
- banfil 11092
- __fsr 15708
- banner 48696
- banprt 388853
- TOTAL: 480686
- The above output has analysed user functions (the default). It makes it
- easy to identify which function is being called the most (COUNT), which
- is taking the most time (ELAPSED), and which is consuming the most CPU (CPU).
- These times are totals for all the functions called.
- The following is a demonstration of the dapptrace command,
- This is the usage for version 0.60,
- # dapptrace -h
- USAGE: dapptrace [-acdeholFLU] [-u lib] { -p PID | command }
- -p PID # examine this PID
- -a # print all details
- -c # print syscall counts
- -d # print relative times (us)
- -e # print elapsed times (us)
- -F # print flow indentation
- -l # print pid/lwpid
- -o # print CPU on cpu times
- -u lib # trace this library instead
- -U # trace all libraries + user funcs
- -b bufsize # dynamic variable buf size
- eg,
- dapptrace df -h # run and examine "df -h"
- dapptrace -p 1871 # examine PID 1871
- dapptrace -Fp 1871 # print using flow indents
- dapptrace -eop 1871 # print elapsed and CPU times
- The following is an example of the default output. We run dapptrace with
- the "banner hello" command,
- # dapptrace banner hi
- # # #
- # # #
- ###### #
- # # #
- # # #
- # # #
- CALL(args) = return
- -> __fsr(0x2, 0x8047D7C, 0x8047D88)
- <- __fsr = 122
- -> main(0x2, 0x8047D7C, 0x8047D88)
- -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
- -> banset(0x20, 0x80614C2, 0x8047DCC)
- <- banset = 36
- -> convert(0x68, 0x8047DCC, 0x2)
- <- convert = 319
- -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
- <- banfil = 57
- -> convert(0x69, 0x8047DCC, 0x2)
- <- convert = 319
- -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
- <- banfil = 57
- <- banner = 118
- -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
- <- banprt = 74
- The default output shows user function calls. An entry is prefixed
- with a "->", and the return has a "<-".
- Here we run dapptrace with the -F for flow indent option,
- # dapptrace -F banner hi
- # # #
- # # #
- ###### #
- # # #
- # # #
- # # #
- CALL(args) = return
- -> __fsr(0x2, 0x8047D7C, 0x8047D88)
- <- __fsr = 122
- -> main(0x2, 0x8047D7C, 0x8047D88)
- -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
- -> banset(0x20, 0x80614C2, 0x8047DCC)
- <- banset = 36
- -> convert(0x68, 0x8047DCC, 0x2)
- <- convert = 319
- -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
- <- banfil = 57
- -> convert(0x69, 0x8047DCC, 0x2)
- <- convert = 319
- -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
- <- banfil = 57
- <- banner = 118
- -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
- <- banprt = 74
- The above output illustrates the flow of the program, which functions
- call which other functions.
- Now the same command is run with -d to display relative timestamps,
- # dapptrace -dF banner hi
- # # #
- # # #
- ###### #
- # # #
- # # #
- # # #
- RELATIVE CALL(args) = return
- 2512 -> __fsr(0x2, 0x8047D7C, 0x8047D88)
- 2516 <- __fsr = 122
- 2518 -> main(0x2, 0x8047D7C, 0x8047D88)
- 2863 -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
- 2865 -> banset(0x20, 0x80614C2, 0x8047DCC)
- 2872 <- banset = 36
- 2874 -> convert(0x68, 0x8047DCC, 0x2)
- 2877 <- convert = 319
- 2879 -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
- 2882 <- banfil = 57
- 2883 -> convert(0x69, 0x8047DCC, 0x2)
- 2885 <- convert = 319
- 2886 -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
- 2888 <- banfil = 57
- 2890 <- banner = 118
- 2892 -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
- 3214 <- banprt = 74
- The relative times are in microseconds since the program's invocation. Great!
- Even better is if we use the -eo options, to print elapsed times and on-cpu
- times,
- # dapptrace -eoF banner hi
- # # #
- # # #
- ###### #
- # # #
- # # #
- # # #
- ELAPSD CPU CALL(args) = return
- . . -> __fsr(0x2, 0x8047D7C, 0x8047D88)
- 41 4 <- __fsr = 122
- . . -> main(0x2, 0x8047D7C, 0x8047D88)
- . . -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
- . . -> banset(0x20, 0x80614C2, 0x8047DCC)
- 29 6 <- banset = 36
- . . -> convert(0x68, 0x8047DCC, 0x2)
- 26 3 <- convert = 319
- . . -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
- 25 2 <- banfil = 57
- . . -> convert(0x69, 0x8047DCC, 0x2)
- 23 1 <- convert = 319
- . . -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
- 23 1 <- banfil = 57
- 309 28 <- banner = 118
- . . -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
- 349 322 <- banprt = 74
- Now it is easy to see which functions take the longest (elapsed), and
- which consume the most CPU cycles.
- The following demonstrates the -U option, to trace all libraries,
- # dapptrace -U banner hi
- # # #
- # # #
- ###### #
- # # #
- # # #
- # # #
- CALL(args) = return
- -> ld.so.1:_rt_boot(0x8047E34, 0x8047E3B, 0x0)
- -> ld.so.1:_setup(0x8047D38, 0x20AE4, 0x3)
- -> ld.so.1:setup(0x8047D88, 0x8047DCC, 0x0)
- -> ld.so.1:fmap_setup(0x0, 0xD27FB2E4, 0xD27FB824)
- <- ld.so.1:fmap_setup = 125
- -> ld.so.1:addfree(0xD27FD3C0, 0xC40, 0x0)
- <- ld.so.1:addfree = 65
- -> ld.so.1:security(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)
- <- ld.so.1:security = 142
- -> ld.so.1:readenv_user(0x8047D88, 0xD27FB204, 0xD27FB220)
- -> ld.so.1:ld_str_env(0x8047E3E, 0xD27FB204, 0xD27FB220)
- <- ld.so.1:ld_str_env = 389
- -> ld.so.1:ld_str_env(0x8047E45, 0xD27FB204, 0xD27FB220)
- <- ld.so.1:ld_str_env = 389
- -> ld.so.1:ld_str_env(0x8047E49, 0xD27FB204, 0xD27FB220)
- <- ld.so.1:ld_str_env = 389
- -> ld.so.1:ld_str_env(0x8047E50, 0xD27FB204, 0xD27FB220)
- -> ld.so.1:strncmp(0x8047E53, 0xD27F7BEB, 0x4)
- <- ld.so.1:strncmp = 113
- -> ld.so.1:rd_event(0xD27FB1F8, 0x3, 0x0)
- [...4486 lines deleted...]
- -> ld.so.1:_lwp_mutex_unlock(0xD27FD380, 0xD27FB824, 0x8047C04)
- <- ld.so.1:_lwp_mutex_unlock = 47
- <- ld.so.1:rt_mutex_unlock = 34
- -> ld.so.1:rt_bind_clear(0x1, 0xD279ECC0, 0xD27FDB2C)
- <- ld.so.1:rt_bind_clear = 34
- <- ld.so.1:leave = 210
- <- ld.so.1:elf_bndr = 803
- <- ld.so.1:elf_rtbndr = 35
- The output was huge, around 4500 lines long. Function names are prefixed
- with their library name, eg "ld.so.1".
- This full output should be used with caution, as it enables so many probes
- it could well be a burden on the system.
- This is a demonstration of the dispqlen.d script,
- Here we run it on a single CPU desktop,
- # dispqlen.d
- Sampling... Hit Ctrl-C to end.
- ^C
- CPU 0
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1790
- 1 |@@@ 160
- 2 | 10
- 3 | 0
- The output shows the length of the dispatcher queue is mostly 0. This is
- evidence that the CPU is not very saturated. It does not indicate that the
- CPU is idle - as we are measuring the length of the queue, not what is
- on the CPU.
- Here it is run on a multi CPU server,
- # dispqlen.d
- Sampling... Hit Ctrl-C to end.
- ^C
- CPU 1
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1573
- 1 |@@@@@@@@@ 436
- 2 | 4
- 3 | 0
- CPU 4
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@ 1100
- 1 |@@@@@@@@@@@@@@@@@@ 912
- 2 | 1
- 3 | 0
- CPU 0
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@ 846
- 1 |@@@@@@@@@@@@@@@@@@@@@@@ 1167
- 2 | 0
- CPU 5
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@ 397
- 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1537
- 2 |@@ 79
- 3 | 0
- The above output shows that threads are queueing up on CPU 5 much more than
- CPU 0.
- The following demonstrates the dtruss command - a DTrace version of truss.
- This version is designed to be less intrusive and safer than running truss.
- dtruss has many options. Here is the help for version 0.70,
- USAGE: dtruss [-acdefholL] [-t syscall] { -p PID | -n name | command }
- -p PID # examine this PID
- -n name # examine this process name
- -t syscall # examine this syscall only
- -a # print all details
- -c # print syscall counts
- -d # print relative times (us)
- -e # print elapsed times (us)
- -f # follow children
- -l # force printing pid/lwpid
- -o # print on cpu times
- -L # don't print pid/lwpid
- -b bufsize # dynamic variable buf size
- eg,
- dtruss df -h # run and examine "df -h"
- dtruss -p 1871 # examine PID 1871
- dtruss -n tar # examine all processes called "tar"
- dtruss -f test.sh # run test.sh and follow children
- For example, here we dtruss any process with the name "ksh" - the Korn shell,
- # dtruss -n ksh
- PID/LWP SYSCALL(args) = return
- 27547/1: llseek(0x3F, 0xE4E, 0x0) = 3662 0
- 27547/1: read(0x3F, "\0", 0x400) = 0 0
- 27547/1: llseek(0x3F, 0x0, 0x0) = 3662 0
- 27547/1: write(0x3F, "ls -l\n\0", 0x8) = 8 0
- 27547/1: fdsync(0x3F, 0x10, 0xFEC1D444) = 0 0
- 27547/1: lwp_sigmask(0x3, 0x20000, 0x0) = 0xFFBFFEFF 0
- 27547/1: stat64("/usr/bin/ls\0", 0x8047A00, 0xFEC1D444) = 0 0
- 27547/1: lwp_sigmask(0x3, 0x0, 0x0) = 0xFFBFFEFF 0
- [...]
- The output for each system call does not yet evaluate as much as truss does.
- In the following example, syscall elapsed and overhead times are measured.
- Elapsed times represent the time from syscall start to finish; overhead
- times measure the time spent on the CPU,
- # dtruss -eon bash
- PID/LWP ELAPSD CPU SYSCALL(args) = return
- 3911/1: 41 26 write(0x2, "l\0", 0x1) = 1 0
- 3911/1: 1001579 43 read(0x0, "s\0", 0x1) = 1 0
- 3911/1: 38 26 write(0x2, "s\0", 0x1) = 1 0
- 3911/1: 1019129 43 read(0x0, " \001\0", 0x1) = 1 0
- 3911/1: 38 26 write(0x2, " \0", 0x1) = 1 0
- 3911/1: 998533 43 read(0x0, "-\0", 0x1) = 1 0
- 3911/1: 38 26 write(0x2, "-\001\0", 0x1) = 1 0
- 3911/1: 1094323 42 read(0x0, "l\0", 0x1) = 1 0
- 3911/1: 39 27 write(0x2, "l\001\0", 0x1) = 1 0
- 3911/1: 1210496 44 read(0x0, "\r\0", 0x1) = 1 0
- 3911/1: 40 28 write(0x2, "\n\001\0", 0x1) = 1 0
- 3911/1: 9 1 lwp_sigmask(0x3, 0x2, 0x0) = 0xFFBFFEFF 0
- 3911/1: 70 63 ioctl(0x0, 0x540F, 0x80F6D00) = 0 0
- A bash command was in another window, where the "ls -l" command was being
- typed. The keystrokes can be seen above, along with the long elapsed times
- (keystroke delays), and short overhead times (as the bash process blocks
- on the read and leaves the CPU).
- Now dtruss is put to the test. Here we truss a test program that runs several
- hundred smaller programs, which in turn generate thousands of system calls.
- First, as a "control" we run the program without a truss or dtruss running,
- # time ./test
- real 0m38.508s
- user 0m5.299s
- sys 0m25.668s
- Now we try truss,
- # time truss ./test 2> /dev/null
- real 0m41.281s
- user 0m0.558s
- sys 0m1.351s
- Now we try dtruss,
- # time dtruss ./test 2> /dev/null
- real 0m46.226s
- user 0m6.771s
- sys 0m31.703s
- In the above test, truss slowed the program from 38 seconds to 41. dtruss
- slowed the program from 38 seconds to 46, slightly slower that truss...
- Now we try follow mode "-f". The test program does run several hundred
- smaller programs, so now there are plenty more system calls to track,
- # time truss -f ./test 2> /dev/null
- real 2m28.317s
- user 0m0.893s
- sys 0m3.527s
- Now we try dtruss,
- # time dtruss -f ./test 2> /dev/null
- real 0m56.179s
- user 0m10.040s
- sys 0m38.185s
- Wow, the difference is huge! truss slows the program from 38 to 148 seconds;
- but dtruss has only slowed the program from 38 to 56 seconds.
- This is an example of the errinfo program, which prints details on syscall
- failures.
- By default it "snoops" syscall failures and prints their details,
- # ./errinfo
- EXEC SYSCALL ERR DESC
- wnck-applet read 11 Resource temporarily unavailable
- Xorg read 11 Resource temporarily unavailable
- nautilus read 11 Resource temporarily unavailable
- Xorg read 11 Resource temporarily unavailable
- dsdm read 11 Resource temporarily unavailable
- Xorg read 11 Resource temporarily unavailable
- Xorg pollsys 4 interrupted system call
- mozilla-bin lwp_park 62 timer expired
- gnome-netstatus- ioctl 12 Not enough core
- mozilla-bin lwp_park 62 timer expired
- Xorg read 11 Resource temporarily unavailable
- mozilla-bin lwp_park 62 timer expired
- [...]
- which is useful to see these events live, but can scroll off the screen
- somewhat rapidly.. so,
- The "-c" option will count the number of errors. Hit Ctrl-C to stop the
- sample. For example,
- # ./errinfo -c
- Tracing... Hit Ctrl-C to end.
- ^C
- EXEC SYSCALL ERR COUNT DESC
- nscd fcntl 22 1 Invalid argument
- xscreensaver read 11 1 Resource temporarily unavailable
- inetd lwp_park 62 1 timer expired
- svc.startd lwp_park 62 1 timer expired
- svc.configd lwp_park 62 1 timer expired
- ttymon ioctl 25 1 Inappropriate ioctl for device
- gnome-netstatus- ioctl 12 2 Not enough core
- mozilla-bin lwp_kill 3 2 No such process
- mozilla-bin connect 150 5 operation now in progress
- svc.startd portfs 62 8 timer expired
- java_vm lwp_cond_wait 62 8 timer expired
- soffice.bin read 11 9 Resource temporarily unavailable
- gnome-terminal read 11 23 Resource temporarily unavailable
- mozilla-bin recv 11 26 Resource temporarily unavailable
- nautilus read 11 26 Resource temporarily unavailable
- gnome-settings-d read 11 26 Resource temporarily unavailable
- gnome-smproxy read 11 34 Resource temporarily unavailable
- gnome-panel read 11 42 Resource temporarily unavailable
- dsdm read 11 112 Resource temporarily unavailable
- metacity read 11 128 Resource temporarily unavailable
- mozilla-bin lwp_park 62 133 timer expired
- Xorg pollsys 4 147 interrupted system call
- wnck-applet read 11 179 Resource temporarily unavailable
- mozilla-bin read 11 258 Resource temporarily unavailable
- Xorg read 11 1707 Resource temporarily unavailable
- Ok, so Xorg has received 1707 of the same type of error for the syscall read().
- The "-n" option lets us match on one type of process only. In the following
- we match processes that have the name "mozilla-bin",
- # ./errinfo -c -n mozilla-bin
- Tracing... Hit Ctrl-C to end.
- ^C
- EXEC SYSCALL ERR COUNT DESC
- mozilla-bin getpeername 134 1 Socket is not connected
- mozilla-bin recv 11 2 Resource temporarily unavailable
- mozilla-bin lwp_kill 3 2 No such process
- mozilla-bin connect 150 5 operation now in progress
- mozilla-bin lwp_park 62 207 timer expired
- mozilla-bin read 11 396 Resource temporarily unavailable
- The "-p" option lets us examine one PID only. The following example examines
- PID 1119,
- # ./errinfo -c -p 1119
- Tracing... Hit Ctrl-C to end.
- ^C
- EXEC SYSCALL ERR COUNT DESC
- Xorg pollsys 4 47 interrupted system call
- Xorg read 11 669 Resource temporarily unavailable
- The following is an example of execsnoop. As processes are executed their
- details are printed out. Another user was logged in running a few commands
- which can be viewed below,
- # ./execsnoop
- UID PID PPID ARGS
- 100 3008 2656 ls
- 100 3009 2656 ls -l
- 100 3010 2656 cat /etc/passwd
- 100 3011 2656 vi /etc/hosts
- 100 3012 2656 date
- 100 3013 2656 ls -l
- 100 3014 2656 ls
- 100 3015 2656 finger
- [...]
- In this example the command "man gzip" was executed. The output lets us
- see what the man command is actually doing,
- # ./execsnoop
- UID PID PPID ARGS
- 100 3064 2656 man gzip
- 100 3065 3064 sh -c cd /usr/share/man; tbl /usr/share/man/man1/gzip.1 |nroff -u0 -Tlp -man -
- 100 3067 3066 tbl /usr/share/man/man1/gzip.1
- 100 3068 3066 nroff -u0 -Tlp -man -
- 100 3066 3065 col -x
- 100 3069 3064 sh -c trap '' 1 15; /usr/bin/mv -f /tmp/mpoMaa_f /usr/share/man/cat1/gzip.1 2>
- 100 3070 3069 /usr/bin/mv -f /tmp/mpoMaa_f /usr/share/man/cat1/gzip.1
- 100 3071 3064 sh -c more -s /tmp/mpoMaa_f
- 100 3072 3071 more -s /tmp/mpoMaa_f
- ^C
- Execsnoop has other options,
- # ./execsnoop -h
- USAGE: execsnoop [-a|-A|-sv] [-c command]
- execsnoop # default output
- -a # print all data
- -A # dump all data, space delimited
- -s # include start time, us
- -v # include start time, string
- -c command # command name to snoop
- In particular the verbose option for human readable timestamps is
- very useful,
- # ./execsnoop -v
- STRTIME UID PID PPID ARGS
- 2005 Jan 22 00:07:22 0 23053 20933 date
- 2005 Jan 22 00:07:24 0 23054 20933 uname -a
- 2005 Jan 22 00:07:25 0 23055 20933 ls -latr
- 2005 Jan 22 00:07:27 0 23056 20933 df -k
- 2005 Jan 22 00:07:29 0 23057 20933 ps -ef
- 2005 Jan 22 00:07:29 0 23057 20933 ps -ef
- 2005 Jan 22 00:07:34 0 23058 20933 uptime
- 2005 Jan 22 00:07:34 0 23058 20933 uptime
- [...]
- It is also possible to match particular commands. Here we watch
- anyone using the vi command only,
- # ./execsnoop -vc vi
- STRTIME UID PID PPID ARGS
- 2005 Jan 22 00:10:33 0 23063 20933 vi /etc/passwd
- 2005 Jan 22 00:10:40 0 23064 20933 vi /etc/shadow
- 2005 Jan 22 00:10:51 0 23065 20933 vi /etc/group
- 2005 Jan 22 00:10:57 0 23066 20933 vi /.rhosts
- [...]
- The following is a demonstration of the fddist command,
- Here fddist is run for a few seconds on an idle workstation,
- Tracing reads and writes... Hit Ctrl-C to end.
- ^C
- EXEC: dtrace PID: 3288
- value ------------- Distribution ------------- count
- 0 | 0
- 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2
- 2 | 0
- EXEC: mozilla-bin PID: 1659
- value ------------- Distribution ------------- count
- 3 | 0
- 4 |@@@@@@@@@@ 28
- 5 | 0
- 6 |@@@@@@@@@@@@@@@ 40
- 7 |@@@@@@@@@@@@@@@ 40
- 8 | 0
- EXEC: Xorg PID: 1532
- value ------------- Distribution ------------- count
- 22 | 0
- 23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 57
- 24 | 0
- The above displays the usage pattern for process file descriptors.
- We can see the Xorg process (PID 1532) has made 57 reads or writes to
- it's file descriptor 23.
- The pfiles(1) command can be used to help determine what file
- descriptor 23 actually is.
- The following is an example of the filebyproc.d script,
- # filebyproc.d
- dtrace: description 'syscall::open*:entry ' matched 2 probes
- CPU ID FUNCTION:NAME
- 0 14 open:entry gnome-netstatus- /dev/kstat
- 0 14 open:entry man /var/ld/ld.config
- 0 14 open:entry man /lib/libc.so.1
- 0 14 open:entry man /usr/share/man/man.cf
- 0 14 open:entry man /usr/share/man/windex
- 0 14 open:entry man /usr/share/man/man1/ls.1
- 0 14 open:entry man /usr/share/man/man1/ls.1
- 0 14 open:entry man /tmp/mpqea4RF
- 0 14 open:entry sh /var/ld/ld.config
- 0 14 open:entry sh /lib/libc.so.1
- 0 14 open:entry neqn /var/ld/ld.config
- 0 14 open:entry neqn /lib/libc.so.1
- 0 14 open:entry neqn /usr/share/lib/pub/eqnchar
- 0 14 open:entry tbl /var/ld/ld.config
- 0 14 open:entry tbl /lib/libc.so.1
- 0 14 open:entry tbl /usr/share/man/man1/ls.1
- 0 14 open:entry nroff /var/ld/ld.config
- [...]
- In the above example, the command "man ls" was run. Each file that was
- attempted to be opened can be seen, along with the program name responsible.
- The following is a demonstration of the hotspot.d script.
- Here the script is run while a large file is copied from one filesystem
- (cmdk0 102,0) to another (cmdk0 102,3). We can see the file mostly resided
- around the 9000 to 10999 Mb range on the source disk (102,0), and was
- copied to the 0 to 999 Mb range on the target disk (102,3).
- # ./hotspot.d
- Tracing... Hit Ctrl-C to end.
- ^C
- Disk: cmdk0 Major,Minor: 102,3
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 418
- 1000 | 0
- Disk: cmdk0 Major,Minor: 102,0
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 | 1
- 1000 | 5
- 2000 | 0
- 3000 | 0
- 4000 | 0
- 5000 | 0
- 6000 | 0
- 7000 | 0
- 8000 | 0
- 9000 |@@@@@ 171
- 10000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1157
- 11000 | 0
- The following is a demonstration of the iofile.d script,
- Here we run it while a tar command is backing up /var/adm,
- # iofile.d
- Tracing... Hit Ctrl-C to end.
- ^C
- PID CMD TIME FILE
- 5206 tar 109 /var/adm/acct/nite
- 5206 tar 110 /var/adm/acct/sum
- 5206 tar 114 /var/adm/acct/fiscal
- 5206 tar 117 /var/adm/messages.3
- 5206 tar 172 /var/adm/sa
- 5206 tar 3605 /var/adm/messages.2
- 5206 tar 4548 /var/adm/spellhist
- 5206 tar 5769 /var/adm/exacct/brendan1task
- 5206 tar 6416 /var/adm/acct
- 5206 tar 7587 /var/adm/messages.1
- 5206 tar 8246 /var/adm/exacct/task
- 5206 tar 8320 /var/adm/pool
- 5206 tar 8973 /var/adm/pool/history
- 5206 tar 9183 /var/adm/exacct
- 3 fsflush 10882 <none>
- 5206 tar 11861 /var/adm/exacct/flow
- 5206 tar 12042 /var/adm/messages.0
- 5206 tar 12408 /var/adm/sm.bin
- 5206 tar 13021 /var/adm/sulog
- 5206 tar 19007 /var/adm/streams
- 5206 tar 21811 <none>
- 5206 tar 24918 /var/adm/exacct/proc
- In the above output, we can see that the tar command spent 24918 us (25 ms)
- waiting for disk I/O on the /var/adm/exacct/proc file.
- The following is a demonstration of the iofileb.d script,
- Here we run it while a tar command is backing up /var/adm,
- # ./iofileb.d
- Tracing... Hit Ctrl-C to end.
- ^C
- PID CMD KB FILE
- 29529 tar 56 /var/adm/sa/sa31
- 29529 tar 56 /var/adm/sa/sa03
- 29529 tar 56 /var/adm/sa/sa02
- 29529 tar 56 /var/adm/sa/sa01
- 29529 tar 56 /var/adm/sa/sa04
- 29529 tar 56 /var/adm/sa/sa27
- 29529 tar 56 /var/adm/sa/sa28
- 29529 tar 324 /var/adm/exacct/task
- 29529 tar 736 /var/adm/wtmpx
- In the above output, we can see that the tar command has caused 736 Kbytes
- of the /var/adm/wtmpx file to be read from disk. All af the Kbyte values
- measured are for disk activity.
- The following is a demonstration of the iopattern program,
- Here we run iopattern for a few seconds then hit Ctrl-C. There is a "dd"
- command running on this system to intentionally create heavy sequential
- disk activity,
- # iopattern
- %RAN %SEQ COUNT MIN MAX AVG KR KW
- 1 99 465 4096 57344 52992 23916 148
- 0 100 556 57344 57344 57344 31136 0
- 0 100 634 57344 57344 57344 35504 0
- 6 94 554 512 57344 54034 29184 49
- 0 100 489 57344 57344 57344 27384 0
- 21 79 568 4096 57344 46188 25576 44
- 4 96 431 4096 57344 56118 23620 0
- ^C
- In the above output we can see that the disk activity is mostly sequential.
- The disks are also pulling around 30 Mb during each sample, with a large
- average event size.
- The following demonstrates iopattern while running a "find" command to
- cause random disk activity,
- # iopattern
- %RAN %SEQ COUNT MIN MAX AVG KR KW
- 86 14 400 1024 8192 1543 603 0
- 81 19 455 1024 8192 1606 714 0
- 89 11 469 512 8192 1854 550 299
- 83 17 463 1024 8192 1782 806 0
- 87 13 394 1024 8192 1551 597 0
- 85 15 348 512 57344 2835 808 155
- 91 9 513 512 47616 2812 570 839
- 76 24 317 512 35840 3755 562 600
- ^C
- In the above output, we can see from the percentages that the disk events
- were mostly random. We can also see that the average event size is small -
- which makes sense if we are reading through many directory files.
- iopattern has options. Here we print timestamps "-v" and measure every 10
- seconds,
- # iopattern -v 10
- TIME %RAN %SEQ COUNT MIN MAX AVG KR KW
- 2005 Jul 25 20:40:55 97 3 33 512 8192 1163 8 29
- 2005 Jul 25 20:41:05 0 0 0 0 0 0 0 0
- 2005 Jul 25 20:41:15 84 16 6 512 11776 5973 22 13
- 2005 Jul 25 20:41:25 100 0 26 512 8192 1496 8 30
- 2005 Jul 25 20:41:35 0 0 0 0 0 0 0 0
- ^C
- The following is a demonstration of the iopending tool,
- Here we run it with a sample interval of 1 second,
- # iopending 1
- Tracing... Please wait.
- 2006 Jan 6 20:21:59, load: 0.02, disk_r: 0 KB, disk_w: 0 KB
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1010
- 1 | 0
- 2006 Jan 6 20:22:00, load: 0.03, disk_r: 0 KB, disk_w: 0 KB
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1000
- 1 | 0
- 2006 Jan 6 20:22:01, load: 0.03, disk_r: 0 KB, disk_w: 0 KB
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1000
- 1 | 0
- ^C
- The iopending tool samples at 1000 Hz, and prints a distribution of how many
- disk events were "pending" completion. In the above example the disks are
- quiet - for all the samples there are zero disk events pending.
- Now iopending is run with no arguments. It will default to an interval of 5
- seconds,
- # iopending
- Tracing... Please wait.
- 2006 Jan 6 19:15:41, load: 0.03, disk_r: 3599 KB, disk_w: 0 KB
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4450
- 1 |@@@ 390
- 2 |@ 80
- 3 | 40
- 4 | 20
- 5 | 30
- 6 | 0
- ^C
- In the above output there was a little disk activity. For 390 samples there
- was 1 I/O event pending; for 80 samples there was 2, and so on.
- In the following example iopending is run during heavy disk activity. We
- print output every 10 seconds,
- # iopending 10
- Tracing... Please wait.
- 2006 Jan 6 20:58:07, load: 0.03, disk_r: 25172 KB, disk_w: 33321 KB
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@ 2160
- 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6720
- 2 |@@@@ 1000
- 3 | 50
- 4 | 30
- 5 | 20
- 6 | 10
- 7 | 10
- 8 | 10
- 9 | 0
- 2006 Jan 6 20:58:17, load: 0.05, disk_r: 8409 KB, disk_w: 12449 KB
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7260
- 1 |@@@@@@@ 1700
- 2 |@ 300
- 3 | 0
- 4 | 10
- 5 | 10
- 6 | 10
- 7 | 20
- 8 | 0
- 9 | 0
- 10 | 0
- 11 | 0
- 12 | 0
- 13 | 0
- 14 | 0
- 15 | 0
- 16 | 0
- 17 | 10
- 18 | 20
- 19 | 0
- 20 | 0
- 21 | 0
- 22 | 0
- 23 | 0
- 24 | 0
- 25 | 0
- 26 | 0
- 27 | 0
- 28 | 0
- 29 | 0
- 30 | 0
- 31 | 10
- >= 32 |@@@ 650
- ^C
- In the first output, most of the time (67%) there was 1 event pending,
- and for a short time there were 8 events pending. In the second output we
- see many samples were off the scale - 650 samples at 32 or more pending
- events. For this sample I had typed "sync" in another window, which
- queued many disk events immediately which were eventually completed.
- The following demonstrates iosnoop. It was run on a system that was
- fairly quiet until a tar command was run,
- # ./iosnoop
- UID PID D BLOCK SIZE COMM PATHNAME
- 0 0 W 1067 512 sched <none>
- 0 0 W 6496304 1024 sched <none>
- 0 3 W 6498797 512 fsflush <none>
- 0 0 W 1067 512 sched <none>
- 0 0 W 6496304 1024 sched <none>
- 100 443 R 892288 4096 Xsun /usr/openwin/bin/Xsun
- 100 443 R 891456 4096 Xsun /usr/openwin/bin/Xsun
- 100 15795 R 3808 8192 tar /usr/bin/eject
- 100 15795 R 35904 6144 tar /usr/bin/eject
- 100 15795 R 39828 6144 tar /usr/bin/env
- 100 15795 R 3872 8192 tar /usr/bin/expr
- 100 15795 R 21120 7168 tar /usr/bin/expr
- 100 15795 R 43680 6144 tar /usr/bin/false
- 100 15795 R 44176 6144 tar /usr/bin/fdetach
- 100 15795 R 3920 8192 tar /usr/bin/fdformat
- 100 15795 R 3936 8192 tar /usr/bin/fdformat
- 100 15795 R 4080 8192 tar /usr/bin/fdformat
- 100 15795 R 9680 3072 tar /usr/bin/fdformat
- 100 15795 R 4096 8192 tar /usr/bin/fgrep
- 100 15795 R 46896 6144 tar /usr/bin/fgrep
- 100 15795 R 4112 8192 tar /usr/bin/file
- 100 15795 R 4128 8192 tar /usr/bin/file
- 100 15795 R 4144 8192 tar /usr/bin/file
- 100 15795 R 21552 7168 tar /usr/bin/file
- 100 15795 R 4192 8192 tar /usr/bin/fmli
- 100 15795 R 4208 8192 tar /usr/bin/fmli
- 100 15795 R 4224 57344 tar /usr/bin/fmli
- 100 15795 R 4336 24576 tar /usr/bin/fmli
- 100 15795 R 695792 8192 tar <none>
- 100 15795 R 696432 57344 tar /usr/bin/fmli
- [...]
- The following are demonstrations of the iotop program,
- Here we run iotop with the -C option to not clear the screen, but instead
- provide a scrolling output,
- # iotop -C
- Tracing... Please wait.
- 2005 Jul 16 00:34:40, load: 1.21, disk_r: 12891 KB, disk_w: 1087 KB
- UID PID PPID CMD DEVICE MAJ MIN D BYTES
- 0 3 0 fsflush cmdk0 102 4 W 512
- 0 3 0 fsflush cmdk0 102 0 W 11776
- 0 27751 20320 tar cmdk0 102 16 W 23040
- 0 3 0 fsflush cmdk0 102 0 R 73728
- 0 0 0 sched cmdk0 102 0 R 548864
- 0 0 0 sched cmdk0 102 0 W 1078272
- 0 27751 20320 tar cmdk0 102 16 R 1514496
- 0 27751 20320 tar cmdk0 102 3 R 11767808
- 2005 Jul 16 00:34:45, load: 1.23, disk_r: 83849 KB, disk_w: 488 KB
- UID PID PPID CMD DEVICE MAJ MIN D BYTES
- 0 0 0 sched cmdk0 102 4 W 1536
- 0 0 0 sched cmdk0 102 0 R 131072
- 0 27752 20320 find cmdk0 102 0 R 262144
- 0 0 0 sched cmdk0 102 0 W 498176
- 0 27751 20320 tar cmdk0 102 3 R 11780096
- 0 27751 20320 tar cmdk0 102 5 R 29745152
- 0 27751 20320 tar cmdk0 102 4 R 47203328
- 2005 Jul 16 00:34:50, load: 1.25, disk_r: 22394 KB, disk_w: 2 KB
- UID PID PPID CMD DEVICE MAJ MIN D BYTES
- 0 27752 20320 find cmdk0 102 0 W 2048
- 0 0 0 sched cmdk0 102 0 R 16384
- 0 321 1 automountd cmdk0 102 0 R 22528
- 0 27752 20320 find cmdk0 102 0 R 1462272
- 0 27751 20320 tar cmdk0 102 5 R 17465344
- In the above output, we can see a tar command is reading from the cmdk0
- disk, from several different slices (different minor numbers), on the last
- report focusing on 102,5 (an "ls -lL" in /dev/dsk can explain the number to
- slice mappings).
- The disk_r and disk_w values give a summary of the overall activity in
- bytes.
- Bytes can be used as a yardstick to determine which process is keeping the
- disks busy, however either of the delta times available from iotop would
- be more accurate (as they take into account whether the activity is random
- or sequential).
- # iotop -Co
- Tracing... Please wait.
- 2005 Jul 16 00:39:03, load: 1.10, disk_r: 5302 KB, disk_w: 20 KB
- UID PID PPID CMD DEVICE MAJ MIN D DISKTIME
- 0 0 0 sched cmdk0 102 0 W 532
- 0 0 0 sched cmdk0 102 0 R 245398
- 0 27758 20320 find cmdk0 102 0 R 3094794
- 2005 Jul 16 00:39:08, load: 1.14, disk_r: 5268 KB, disk_w: 273 KB
- UID PID PPID CMD DEVICE MAJ MIN D DISKTIME
- 0 3 0 fsflush cmdk0 102 0 W 2834
- 0 0 0 sched cmdk0 102 0 W 263527
- 0 0 0 sched cmdk0 102 0 R 285015
- 0 3 0 fsflush cmdk0 102 0 R 519187
- 0 27758 20320 find cmdk0 102 0 R 2429232
- 2005 Jul 16 00:39:13, load: 1.16, disk_r: 602 KB, disk_w: 1238 KB
- UID PID PPID CMD DEVICE MAJ MIN D DISKTIME
- 0 3 0 fsflush cmdk0 102 4 W 200
- 0 3 0 fsflush cmdk0 102 6 W 260
- 0 3 0 fsflush cmdk0 102 0 W 883
- 0 27758 20320 find cmdk0 102 0 R 55686
- 0 3 0 fsflush cmdk0 102 0 R 317508
- 0 0 0 sched cmdk0 102 0 R 320195
- 0 0 0 sched cmdk0 102 0 W 571084
- [...]
- The disk time is in microseconds. In the first sample, we can see the find
- command caused a total of 3.094 seconds of disk time - the duration of the
- samples here is 5 seconds (the default), so it would be fair to say that
- the find command is keeping the disk 60% busy.
- A new option for iotop is to print percents "-P" which are based on disk
- I/O times, and hense are a fair measurementt of what is keeping the disks
- busy.
- # iotop -PC 1
- Tracing... Please wait.
- 2005 Nov 18 15:26:14, load: 0.24, disk_r: 13176 KB, disk_w: 0 KB
- UID PID PPID CMD DEVICE MAJ MIN D %I/O
- 0 2215 1663 bart cmdk0 102 0 R 85
- 2005 Nov 18 15:26:15, load: 0.25, disk_r: 5263 KB, disk_w: 0 KB
- UID PID PPID CMD DEVICE MAJ MIN D %I/O
- 0 2214 1663 find cmdk0 102 0 R 15
- 0 2215 1663 bart cmdk0 102 0 R 67
- 2005 Nov 18 15:26:16, load: 0.25, disk_r: 8724 KB, disk_w: 0 KB
- UID PID PPID CMD DEVICE MAJ MIN D %I/O
- 0 2214 1663 find cmdk0 102 0 R 10
- 0 2215 1663 bart cmdk0 102 0 R 71
- 2005 Nov 18 15:26:17, load: 0.25, disk_r: 7528 KB, disk_w: 0 KB
- UID PID PPID CMD DEVICE MAJ MIN D %I/O
- 0 2214 1663 find cmdk0 102 0 R 0
- 0 2215 1663 bart cmdk0 102 0 R 85
- 2005 Nov 18 15:26:18, load: 0.26, disk_r: 11389 KB, disk_w: 0 KB
- UID PID PPID CMD DEVICE MAJ MIN D %I/O
- 0 2214 1663 find cmdk0 102 0 R 2
- 0 2215 1663 bart cmdk0 102 0 R 80
- 2005 Nov 18 15:26:19, load: 0.26, disk_r: 22109 KB, disk_w: 0 KB
- UID PID PPID CMD DEVICE MAJ MIN D %I/O
- 0 2215 1663 bart cmdk0 102 0 R 76
- ^C
- In the above output, bart and find jostle for disk access as they create
- a database of file checksums. The command was,
- find / | bart create -I > /dev/null
- Note that the %I/O is in terms of 1 disk. A %I/O of say 200 is allowed - it
- would mean that effectively 2 disks were at 100%, or 4 disks at 50%, etc.
- This is an example of the kill.d DTrace script,
- # kill.d
- FROM COMMAND SIG TO RESULT
- 2344 bash 2 3117 0
- 2344 bash 9 12345 -1
- ^C
- In the above output, a kill -2 (Ctrl-C) was sent from the bash command
- to PID 3177. Then a kill -9 (SIGKILL) was sent to PID 12345 - which
- returned a "-1" for failure.
- The following is a demonstration of the lastwords command,
- Here we run lastwords to catch syscalls from processes named "bash" as they
- exit,
- # ./lastwords bash
- Tracing... Waiting for bash to exit...
- 1091567219163679 1861 bash sigaction 0 0
- 1091567219177487 1861 bash sigaction 0 0
- 1091567219189692 1861 bash sigaction 0 0
- 1091567219202085 1861 bash sigaction 0 0
- 1091567219214553 1861 bash sigaction 0 0
- 1091567219226690 1861 bash sigaction 0 0
- 1091567219238786 1861 bash sigaction 0 0
- 1091567219251697 1861 bash sigaction 0 0
- 1091567219265770 1861 bash sigaction 0 0
- 1091567219294110 1861 bash gtime 42a7c194 0
- 1091567219428305 1861 bash write 5 0
- 1091567219451138 1861 bash setcontext 0 0
- 1091567219473911 1861 bash sigaction 0 0
- 1091567219516487 1861 bash stat64 0 0
- 1091567219547973 1861 bash open64 4 0
- 1091567219638345 1861 bash write 5 0
- 1091567219658886 1861 bash close 0 0
- 1091567219689094 1861 bash open64 4 0
- 1091567219704301 1861 bash fstat64 0 0
- 1091567219731796 1861 bash read 2fe 0
- 1091567219745541 1861 bash close 0 0
- 1091567219768536 1861 bash lwp_sigmask ffbffeff 0
- 1091567219787494 1861 bash ioctl 0 0
- 1091567219801338 1861 bash setpgrp 6a3 0
- 1091567219814067 1861 bash ioctl 0 0
- 1091567219825791 1861 bash lwp_sigmask ffbffeff 0
- 1091567219847778 1861 bash setpgrp 0 0
- TIME PID EXEC SYSCALL RETURN ERR
- In another window, a bash shell was executed and then exited normally. The
- last few system calls that the bash shell made can be seen above.
- In the following example we moniter the exit of bash shells again, but this
- time the bash shell sends itself a "kill -8",
- # ./lastwords bash
- Tracing... Waiting for bash to exit...
- 1091650185555391 1865 bash sigaction 0 0
- 1091650185567963 1865 bash sigaction 0 0
- 1091650185580316 1865 bash sigaction 0 0
- 1091650185592381 1865 bash sigaction 0 0
- 1091650185605046 1865 bash sigaction 0 0
- 1091650185618451 1865 bash sigaction 0 0
- 1091650185647663 1865 bash gtime 42a7c1e7 0
- 1091650185794626 1865 bash kill 0 0
- 1091650185836941 1865 bash lwp_sigmask ffbffeff 0
- 1091650185884145 1865 bash stat64 0 0
- 1091650185916135 1865 bash open64 4 0
- 1091650186005673 1865 bash write b 0
- 1091650186025782 1865 bash close 0 0
- 1091650186052002 1865 bash open64 4 0
- 1091650186067538 1865 bash fstat64 0 0
- 1091650186094289 1865 bash read 309 0
- 1091650186108086 1865 bash close 0 0
- 1091650186129965 1865 bash lwp_sigmask ffbffeff 0
- 1091650186149092 1865 bash ioctl 0 0
- 1091650186162614 1865 bash setpgrp 6a3 0
- 1091650186175457 1865 bash ioctl 0 0
- 1091650186187206 1865 bash lwp_sigmask ffbffeff 0
- 1091650186209514 1865 bash setpgrp 0 0
- 1091650186225307 1865 bash sigaction 0 0
- 1091650186238832 1865 bash getpid 749 0
- 1091650186260149 1865 bash kill 0 0
- 1091650186277925 1865 bash setcontext 0 0
- TIME PID EXEC SYSCALL RETURN ERR
- The last few system calls are different, we can see the kill system call
- before bash exits.
- The following is a demonstration of the loads.d script.
- Here we run both loads.d and the uptime command for comparison,
- # uptime
- 1:30am up 14 day(s), 2:27, 3 users, load average: 3.52, 3.45, 3.05
- # ./loads.d
- 2005 Jun 11 01:30:49, load average: 3.52, 3.45, 3.05
- Both have returned the same load average, confirming that loads.d is
- behaving as expected.
- The point of loads.d is to demonstrate fetching the same data as uptime
- does, in the DTrace language. It is not intended as a replacement
- or substitute to the uptime(1) command.
- The following is an example of the newproc.d script,
- # ./newproc.d
- dtrace: description 'proc:::exec-success ' matched 1 probe
- CPU ID FUNCTION:NAME
- 0 3297 exec_common:exec-success man ls
- 0 3297 exec_common:exec-success sh -c cd /usr/share/man; tbl /usr/share/man/man1/ls.1 |neqn /usr/share/lib/pub/
- 0 3297 exec_common:exec-success tbl /usr/share/man/man1/ls.1
- 0 3297 exec_common:exec-success neqn /usr/share/lib/pub/eqnchar -
- 0 3297 exec_common:exec-success nroff -u0 -Tlp -man -
- 0 3297 exec_common:exec-success col -x
- 0 3297 exec_common:exec-success sh -c trap '' 1 15; /usr/bin/mv -f/tmp/mpzIaOZF /usr/share/man/cat1/ls.1 2> /d
- 0 3297 exec_common:exec-success /usr/bin/mv -f /tmp/mpzIaOZF /usr/share/man/cat1/ls.1
- 0 3297 exec_common:exec-success sh -c more -s /tmp/mpzIaOZF
- 0 3297 exec_common:exec-success more -s /tmp/mpzIaOZF
- The above output was caught when running "man ls". This identifies all the
- commands responsible for processing the man page.
- The following are examples of opensnoop. File open events are traced
- along with some process details.
- This first example is of the default output. The commands "cat", "cal",
- "ls" and "uname" were run. The returned file descriptor (or -1 for error) are
- shown, along with the filenames.
- # ./opensnoop
- UID PID COMM FD PATH
- 100 3504 cat -1 /var/ld/ld.config
- 100 3504 cat 3 /usr/lib/libc.so.1
- 100 3504 cat 3 /etc/passwd
- 100 3505 cal -1 /var/ld/ld.config
- 100 3505 cal 3 /usr/lib/libc.so.1
- 100 3505 cal 3 /usr/share/lib/zoneinfo/Australia/NSW
- 100 3506 ls -1 /var/ld/ld.config
- 100 3506 ls 3 /usr/lib/libc.so.1
- 100 3507 uname -1 /var/ld/ld.config
- 100 3507 uname 3 /usr/lib/libc.so.1
- [...]
- Full command arguments can be fetched using -g,
- # ./opensnoop -g
- UID PID PATH FD ARGS
- 100 3528 /var/ld/ld.config -1 cat /etc/passwd
- 100 3528 /usr/lib/libc.so.1 3 cat /etc/passwd
- 100 3528 /etc/passwd 3 cat /etc/passwd
- 100 3529 /var/ld/ld.config -1 cal
- 100 3529 /usr/lib/libc.so.1 3 cal
- 100 3529 /usr/share/lib/zoneinfo/Australia/NSW 3 cal
- 100 3530 /var/ld/ld.config -1 ls -l
- 100 3530 /usr/lib/libc.so.1 3 ls -l
- 100 3530 /var/run/name_service_door 3 ls -l
- 100 3530 /usr/share/lib/zoneinfo/Australia/NSW 4 ls -l
- 100 3531 /var/ld/ld.config -1 uname -a
- 100 3531 /usr/lib/libc.so.1 3 uname -a
- [...]
- The verbose option prints human readable timestamps,
- # ./opensnoop -v
- STRTIME UID PID COMM FD PATH
- 2005 Jan 22 01:22:50 0 23212 df -1 /var/ld/ld.config
- 2005 Jan 22 01:22:50 0 23212 df 3 /lib/libcmd.so.1
- 2005 Jan 22 01:22:50 0 23212 df 3 /lib/libc.so.1
- 2005 Jan 22 01:22:50 0 23212 df 3 /platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1
- 2005 Jan 22 01:22:50 0 23212 df 3 /etc/mnttab
- 2005 Jan 22 01:22:50 0 23211 dtrace 4 /usr/share/lib/zoneinfo/Australia/NSW
- 2005 Jan 22 01:22:51 0 23213 uname -1 /var/ld/ld.config
- 2005 Jan 22 01:22:51 0 23213 uname 3 /lib/libc.so.1
- 2005 Jan 22 01:22:51 0 23213 uname 3 /platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1
- [...]
- Particular files can be monitored using -f. For example,
- # ./opensnoop -vgf /etc/passwd
- STRTIME UID PID PATH FD ARGS
- 2005 Jan 22 01:28:50 0 23242 /etc/passwd 3 cat /etc/passwd
- 2005 Jan 22 01:28:54 0 23243 /etc/passwd 4 vi /etc/passwd
- 2005 Jan 22 01:29:06 0 23244 /etc/passwd 3 passwd brendan
- [...]
- This example is of opensnoop running on a quiet system. We can see as
- various daemons are opening files,
- # ./opensnoop
- UID PID COMM FD PATH
- 0 253 nscd 5 /etc/user_attr
- 0 253 nscd 5 /etc/hosts
- 0 419 mibiisa 2 /dev/kstat
- 0 419 mibiisa 2 /dev/rtls
- 0 419 mibiisa 2 /dev/kstat
- 0 419 mibiisa 2 /dev/kstat
- 0 419 mibiisa 2 /dev/rtls
- 0 419 mibiisa 2 /dev/kstat
- 0 253 nscd 5 /etc/user_attr
- 0 419 mibiisa 2 /dev/kstat
- 0 419 mibiisa 2 /dev/rtls
- 0 419 mibiisa 2 /dev/kstat
- 0 174 in.routed 8 /dev/kstat
- 0 174 in.routed 8 /dev/kstat
- 0 174 in.routed 6 /dev/ip
- 0 419 mibiisa 2 /dev/kstat
- 0 419 mibiisa 2 /dev/rtls
- 0 419 mibiisa 2 /dev/kstat
- 0 293 utmpd 4 /var/adm/utmpx
- 0 293 utmpd 5 /var/adm/utmpx
- 0 293 utmpd 6 /proc/442/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/567/psinfo
- 0 293 utmpd 6 /proc/3013/psinfo
- 0 419 mibiisa 2 /dev/kstat
- 0 419 mibiisa 2 /dev/rtls
- 0 419 mibiisa 2 /dev/kstat
- [...]
- The following is a demonstration of the pathopens.d script,
- Here we run it for a few seconds then hit Ctrl-C,
- # pathopens.d
- Tracing... Hit Ctrl-C to end.
- ^C
- COUNT PATHNAME
- 1 /lib/libcmd.so.1
- 1 /export/home/root/DTrace/Dexplorer/dexplorer
- 1 /lib/libmd5.so.1
- 1 /lib/libaio.so.1
- 1 /lib/librt.so.1
- 1 /etc/security/prof_attr
- 1 /etc/mnttab
- 2 /devices/pseudo/devinfo@0:devinfo
- 2 /dev/kstat
- 2 /lib/libnvpair.so.1
- 2 /lib/libkstat.so.1
- 2 /lib/libdevinfo.so.1
- 2 /lib/libnsl.so.1
- 4 /lib/libc.so.1
- 4 /var/ld/ld.config
- 8 /export/home/brendan/Utils_solx86/setiathome-3.08.i386-pc-solaris2.6/outfile.sah
- In the above output, many of the files would have been opened using
- absolute pathnames. However the "dexplorer" file was opened using a relative
- pathname - and the pathopens.d script has correctly printed the full path.
- The above shows that the outfile.sah file was opened successfully 8 times.
- The following is a demonstration of the pidpersec.d script.
- Here the program is run on an idle system,
- # ./pidpersec.d
- TIME LASTPID PID/s
- 2005 Jun 9 22:15:09 3010 0
- 2005 Jun 9 22:15:10 3010 0
- 2005 Jun 9 22:15:11 3010 0
- 2005 Jun 9 22:15:12 3010 0
- 2005 Jun 9 22:15:13 3010 0
- ^C
- This shows that there are now new processes being created.
- Now the script is run on a busy system, that is creating many processes
- (which happen to be short-lived),
- # ./pidpersec.d
- TIME LASTPID PID/s
- 2005 Jun 9 22:16:30 3051 13
- 2005 Jun 9 22:16:31 3063 12
- 2005 Jun 9 22:16:32 3073 10
- 2005 Jun 9 22:16:33 3084 11
- 2005 Jun 9 22:16:34 3096 12
- ^C
- Now we can see that there are over 10 new processes created each second.
- The value for lastpid confirms the rates printed.
- The following is a demonstration of the priclass.d script.
- The script was run for several seconds then Ctrl-C was hit. During
- this time, other processes in different scheduling classes were
- running.
- # ./priclass.d
- Sampling... Hit Ctrl-C to end.
- ^C
- IA
- value ------------- Distribution ------------- count
- 40 | 0
- 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 30
- 60 | 0
- SYS
- value ------------- Distribution ------------- count
- < 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4959
- 0 | 0
- 10 | 0
- 20 | 0
- 30 | 0
- 40 | 0
- 50 | 0
- 60 | 30
- 70 | 0
- 80 | 0
- 90 | 0
- 100 | 0
- 110 | 0
- 120 | 0
- 130 | 0
- 140 | 0
- 150 | 0
- 160 | 50
- >= 170 | 0
- RT
- value ------------- Distribution ------------- count
- 90 | 0
- 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110
- 110 | 0
- TS
- value ------------- Distribution ------------- count
- < 0 | 0
- 0 |@@@@@@@@@@@@@@@ 2880
- 10 |@@@@@@@ 1280
- 20 |@@@@@ 990
- 30 |@@@@@ 920
- 40 |@@@@ 670
- 50 |@@@@ 730
- 60 | 0
- The output is quite interesting, and illustrates neatly the behaviour
- of different scheduling classes.
- The IA interactive class had 30 samples of a 50 to 59 priority, a fairly
- high priority. This class is used for interactive processes, such as
- the windowing system. I had clicked on a few windows to create this
- activity.
- The SYS system class has had 4959 samples at a < 0 priority - the lowest,
- which was for the idle thread. There are a few samples at higher
- priorities, including some in the 160 to 169 range (the highest), which
- are for interrupt threads. The system class is used by the kernel.
- The RT real time class had 110 samples in the 100 to 109 priority range.
- This class is designed for real-time applications, those that must have
- a consistant response time regardless of other process activity. For that
- reason, the RT class trumps both TS and IA. I created these events by
- running "prstat -R" as root, which runs prstat in the real time class.
- The TS time sharing class is the default scheduling class for the processes
- on a Solaris system. I ran an infinite shell loop to create heavy activity,
- "while :; do :; done", which shows a profile that leans towards lower
- priorities. This is deliberate behaivour from the time sharing class, which
- reduces the priority of CPU bound processes so that they interefere less
- with I/O bound processes. The result is more samples in the lower priority
- ranges.
- The following are demonstrations of the pridist.d script.
- Here we run pridist.d for a few seconds then hit Ctrl-C,
- # pridist.d
- Sampling... Hit Ctrl-C to end.
- ^C
- CMD: setiathome PID: 2190
- value ------------- Distribution ------------- count
- -5 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6629
- 5 | 0
- CMD: sshd PID: 9172
- value ------------- Distribution ------------- count
- 50 | 0
- 55 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
- 60 | 0
- CMD: mozilla-bin PID: 3164
- value ------------- Distribution ------------- count
- 40 | 0
- 45 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 20
- 50 | 0
- CMD: perl PID: 11544
- value ------------- Distribution ------------- count
- 10 | 0
- 15 |@@@@@@@@ 60
- 20 | 0
- 25 |@@@@@@@@@@@@@@@ 120
- 30 | 0
- 35 |@@@@@@@@@@ 80
- 40 | 0
- 45 |@@@@@ 40
- 50 | 0
- 55 |@@@ 20
- 60 | 0
- During this sample there was a CPU bound process called "setiathome"
- running, and a new CPU bound "perl" process was executed.
- perl, executing an infinite loop, begins with a high priority of 55 to 59
- where it is sampled 20 times. pridist.d samples 1000 times per second,
- so this equates to 20 ms. The perl process has also been sampled for 40 ms
- at priority 45 to 49, for 80 ms at priority 35 to 39, down to 60 ms at a
- priority 15 to 19 - at which point I had hit Ctrl-C to end sampling.
- The output is spectacular as it matches the behaviour of the dispatcher
- table for the time sharing class perfectly!
- setiathome is running with the lowest priority, in the 0 to 4 range.
- ... ok, so when I say 20 samples equates 20 ms, we know that's only an
- estimate. It really means that for 20 samples that process was the one on
- the CPU. In between the samples anything may have occured (I/O bound
- processes will context switch off the CPU). DTrace can certainly be used
- to measure this based on schedular events not samples (eg, cpudist),
- however DTrace can then sometimes consume a noticable portion of the CPUs
- (for example, 2%).
- The following is a longer sample. Again, I start a new CPU bound perl
- process,
- # pridist.d
- Sampling... Hit Ctrl-C to end.
- ^C
- CMD: setiathome PID: 2190
- value ------------- Distribution ------------- count
- -5 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1820
- 5 | 0
- CMD: mozilla-bin PID: 3164
- value ------------- Distribution ------------- count
- 40 | 0
- 45 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
- 50 | 0
- CMD: bash PID: 9185
- value ------------- Distribution ------------- count
- 50 | 0
- 55 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
- 60 | 0
- CMD: perl PID: 11547
- value ------------- Distribution ------------- count
- -5 | 0
- 0 |@@@@@@@@@@@@@@@ 2020
- 5 |@@ 200
- 10 |@@@@@@@ 960
- 15 |@ 160
- 20 |@@@@@ 720
- 25 |@ 120
- 30 |@@@@ 480
- 35 |@ 80
- 40 |@@ 240
- 45 | 40
- 50 |@@ 240
- 55 | 10
- 60 | 0
- Now other behaviour can be observed as the perl process runs. The effect
- here is due to ts_maxwait triggering a priority boot to avoid CPU starvation;
- the priority is boosted to the 50 to 54 range, then decreases by 10 until
- it reaches 0 and another ts_maxwait is triggered. The process spends
- more time at lower priorities, as that is exactly how the TS dispatch table
- has been configured.
- Now we run prdist.d for a considerable time,
- # pridist.d
- Sampling... Hit Ctrl-C to end.
- ^C
- CMD: setiathome PID: 2190
- value ------------- Distribution ------------- count
- -5 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3060
- 5 | 0
- CMD: mozilla-bin PID: 3164
- value ------------- Distribution ------------- count
- 40 | 0
- 45 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 20
- 50 | 0
- CMD: perl PID: 11549
- value ------------- Distribution ------------- count
- -5 | 0
- 0 |@@@@@@@@@@@@@@@@@@@ 7680
- 5 | 0
- 10 |@@@@@@@ 3040
- 15 | 70
- 20 |@@@@@@ 2280
- 25 | 120
- 30 |@@@@ 1580
- 35 | 80
- 40 |@@ 800
- 45 | 40
- 50 |@@ 800
- 55 | 20
- 60 | 0
- The process has settled to a pattern of 0 priority, ts_maxwait boot to 50,
- drop back to 0.
- Run "dispadmin -c TS -g" for a printout of the time sharing dispatcher table.
- The following shows running pridist.d on a completely idle system,
- # pridist.d
- Sampling... Hit Ctrl-C to end.
- ^C
- CMD: sched PID: 0
- value ------------- Distribution ------------- count
- -10 | 0
- -5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1190
- 0 | 0
- Only the kernel "sched" was sampled. It would have been running the idle
- thread.
- The following is an unusual output that is worth mentioning,
- # pridist.d
- Sampling... Hit Ctrl-C to end.
- ^C
- CMD: sched PID: 0
- value ------------- Distribution ------------- count
- -10 | 0
- -5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 940
- 0 | 0
- 5 | 0
- 10 | 0
- 15 | 0
- 20 | 0
- 25 | 0
- 30 | 0
- 35 | 0
- 40 | 0
- 45 | 0
- 50 | 0
- 55 | 0
- 60 | 0
- 65 | 0
- 70 | 0
- 75 | 0
- 80 | 0
- 85 | 0
- 90 | 0
- 95 | 0
- 100 | 0
- 105 | 0
- 110 | 0
- 115 | 0
- 120 | 0
- 125 | 0
- 130 | 0
- 135 | 0
- 140 | 0
- 145 | 0
- 150 | 0
- 155 | 0
- 160 | 0
- 165 | 10
- >= 170 | 0
- Here we have sampled the kernel running at a priority of 165 to 169. This
- is the interrupt priority range, and would be an interrupt servicing thread.
- Eg, a network interrupt.
- This is a demonstration of the procsystime tool, which can give details
- on how processes make use of system calls.
- Here we run procsystime on processes which have the name "bash",
- # procsystime -n bash
- Hit Ctrl-C to stop sampling...
- ^C
- Elapsed Times for process bash,
- SYSCALL TIME (ns)
- setpgrp 27768
- gtime 28692
- lwp_sigmask 148074
- write 235814
- sigaction 553556
- ioctl 776691
- read 857401243
- By default procsystime prints elapsed times, the time from when the syscall
- was issued to it's completion. In the above output, we can see the read()
- syscall took the most time for this process - 8.57 seconds for all the
- reads combined. This is because the read syscall is waiting for keystrokes.
- Here we try the "-o" option to print CPU overhead times on "bash",
- # procsystime -o -n bash
- Hit Ctrl-C to stop sampling...
- ^C
- CPU Times for process bash,
- SYSCALL TIME (ns)
- setpgrp 6994
- gtime 8054
- lwp_sigmask 33865
- read 154895
- sigaction 259899
- write 343825
- ioctl 932280
- This identifies which syscall type from bash is consuming the most CPU time.
- This is ioctl, at 932 microseconds. Compare this output to the default in
- the first example - both are useful for different reasons, this CPU overhead
- output helps us see why processes are consuming a lot of sys time.
- This demonstrates using the "-a" for all details, this time with "ssh",
- # procsystime -a -n ssh
- Hit Ctrl-C to stop sampling...
- ^C
- Elapsed Times for processes ssh,
- SYSCALL TIME (ns)
- read 115833
- write 302419
- pollsys 114616076
- TOTAL: 115034328
- CPU Times for processes ssh,
- SYSCALL TIME (ns)
- read 82381
- pollsys 201818
- write 280390
- TOTAL: 564589
- Syscall Counts for processes ssh,
- SYSCALL COUNT
- read 4
- write 4
- pollsys 8
- TOTAL: 16
- Now we can see elapsed times, overhead times, and syscall counts in one
- report. Very handy. We can also see totals printed as "TOTAL:".
- procsystime also lets us just examine one PID. For example,
- # procsystime -p 1304
- Hit Ctrl-C to stop sampling...
- ^C
- Elapsed Times for PID 1304,
- SYSCALL TIME (ns)
- fcntl 7323
- fstat64 21349
- ioctl 190683
- read 238197
- write 1276169
- pollsys 1005360640
- Here is a longer example of running procsystime on mozilla,
- # procsystime -a -n mozilla-bin
- Hit Ctrl-C to stop sampling...
- ^C
- Elapsed Times for processes mozilla-bin,
- SYSCALL TIME (ns)
- readv 677958
- writev 1159088
- yield 1298742
- read 18019194
- write 35679619
- ioctl 108845685
- lwp_park 38090969432
- pollsys 65955258781
- TOTAL: 104211908499
- CPU Times for processes mozilla-bin,
- SYSCALL TIME (ns)
- yield 120345
- readv 398046
- writev 1117178
- lwp_park 8591428
- read 9752315
- write 29043460
- ioctl 37089349
- pollsys 189933470
- TOTAL: 276045591
- Syscall Counts for processes mozilla-bin,
- SYSCALL COUNT
- writev 3
- yield 9
- readv 58
- lwp_park 280
- write 1317
- read 1744
- pollsys 8268
- ioctl 16434
- TOTAL: 28113
- The following is a demonstration of the rwbypid.d script,
- Here we run it for a few seconds then hit Ctrl-C,
- # rwbypid.d
- Tracing... Hit Ctrl-C to end.
- ^C
- PID CMD DIR COUNT
- 11131 dtrace W 2
- 20334 sshd W 17
- 20334 sshd R 24
- 1532 Xorg W 69
- 1659 mozilla-bin R 852
- 1659 mozilla-bin W 1128
- 1532 Xorg R 1702
- In the above output, we can see that Xorg with PID 1532 has made 1702 reads.
- The following is an example fo the rwbytype.d script.
- We run rwbytype.d for a few seconds then hit Ctrl-C,
- # rwbytype.d
- Tracing... Hit Ctrl-C to end.
- ^C
- PID CMD VTYPE DIR BYTES
- 1545 sshd chr W 1
- 10357 more chr R 30
- 2357 sshd chr W 31
- 10354 dtrace chr W 32
- 1545 sshd chr R 34
- 6778 bash chr W 44
- 1545 sshd sock R 52
- 405 poold reg W 68
- 1545 sshd sock W 136
- 10357 bash reg R 481
- 10356 find reg R 481
- 10355 bash reg R 481
- 10357 more reg R 1652
- 2357 sshd sock R 1664
- 10357 more chr W 96925
- 10357 more fifo R 97280
- 2357 sshd chr R 98686
- 10356 grep fifo W 117760
- 2357 sshd sock W 118972
- 10356 grep reg R 147645
- Here we can see that the grep process with PID 10356 read 147645 bytes
- from "regular" files. These are I/O bytes at the application level, so
- much of these read bytes would have been cached by the filesystem page cache.
- vnode file types are listed in /usr/include/sys/vnode.h, and give an idea of
- what the file descriptor refers to.
- The following is a demonstration of the rwsnoop program,
- Here we run it for about a second,
- # rwsnoop
- UID PID CMD D BYTES FILE
- 100 20334 sshd R 52 <unknown>
- 100 20334 sshd W 1 /devices/pseudo/clone@0:ptm
- 0 20320 bash W 1 /devices/pseudo/pts@0:12
- 100 20334 sshd R 2 /devices/pseudo/clone@0:ptm
- 100 20334 sshd W 52 <unknown>
- 0 2848 ls W 58 /devices/pseudo/pts@0:12
- 0 2848 ls W 68 /devices/pseudo/pts@0:12
- 0 2848 ls W 57 /devices/pseudo/pts@0:12
- 0 2848 ls W 67 /devices/pseudo/pts@0:12
- 0 2848 ls W 48 /devices/pseudo/pts@0:12
- 0 2848 ls W 49 /devices/pseudo/pts@0:12
- 0 2848 ls W 33 /devices/pseudo/pts@0:12
- 0 2848 ls W 41 /devices/pseudo/pts@0:12
- 100 20334 sshd R 429 /devices/pseudo/clone@0:ptm
- 100 20334 sshd W 468 <unknown>
- ^C
- The output scrolls rather fast. Above, we can see an ls command was run,
- and we can see as ls writes each line. The "<unknown>" read/writes are
- socket activity, which have no corresponding filename.
- For a summary style output, use the rwtop program.
- If a particular program is of interest, the "-n" option can be used
- to match on process name. Here we match on "bash" during a login where
- the user uses the bash shell as their default,
- # rwsnoop -n bash
- UID PID CMD D BYTES FILE
- 100 2854 bash R 757 /etc/nsswitch.conf
- 100 2854 bash R 0 /etc/nsswitch.conf
- 100 2854 bash R 668 /etc/passwd
- 100 2854 bash R 980 /etc/profile
- 100 2854 bash W 15 /devices/pseudo/pts@0:14
- 100 2854 bash R 10 /export/home/brendan/.bash_profile
- 100 2854 bash R 867 /export/home/brendan/.bashrc
- 100 2854 bash R 980 /etc/profile
- 100 2854 bash W 15 /devices/pseudo/pts@0:14
- 100 2854 bash R 8951 /export/home/brendan/.bash_history
- 100 2854 bash R 8951 /export/home/brendan/.bash_history
- 100 2854 bash R 1652 /usr/share/lib/terminfo/d/dtterm
- 100 2854 bash W 41 /devices/pseudo/pts@0:14
- 100 2854 bash R 1 /devices/pseudo/pts@0:14
- 100 2854 bash W 1 /devices/pseudo/pts@0:14
- 100 2854 bash W 41 /devices/pseudo/pts@0:14
- 100 2854 bash R 1 /devices/pseudo/pts@0:14
- 100 2854 bash W 7 /devices/pseudo/pts@0:14
- In the above, various bash related files such as ".bash_profile" and
- ".bash_history" can be seen. The ".bashrc" is also read, as it was sourced
- from the .bash_profile.
- Extra options with rwsnoop allow us to print zone ID, project ID, timestamps,
- etc. Here we use "-v" to see the time printed, and match on "ps" processes,
- # rwsnoop -vn ps
- TIMESTR UID PID CMD D BYTES FILE
- 2005 Jul 24 04:23:45 0 2804 ps R 168 /proc/2804/auxv
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/2804/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 1495 /etc/ttysrch
- 2005 Jul 24 04:23:45 0 2804 ps W 28 /devices/pseudo/pts.
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/0/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/1/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/2/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/3/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/218/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/7/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/9/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/360/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/91/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/112/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/307/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/226/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/242/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/228/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/243/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/234/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/119/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/143/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/361/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/20314/psinfo
- 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/116/psinfo
- [...]
- The following is an example of the sampleproc program.
- Here we run sampleproc for a few seconds on a workstation,
- # ./sampleproc
- Sampling at 100 hertz... Hit Ctrl-C to end.
- ^C
- PID CMD COUNT
- 1659 mozilla-bin 3
- 109 nscd 4
- 2197 prstat 23
- 2190 setiathome 421
- PID CMD PERCENT
- 1659 mozilla-bin 0
- 109 nscd 0
- 2197 prstat 5
- 2190 setiathome 93
- The first table shows a count of how many times each process was sampled
- on the CPU. The second table gives this as a percentage.
- setiathome was on the CPU 421 times, which is 93% of the samples.
- The following is sampleproc running on a server with 4 CPUs. A bash shell
- is running in an infinate loop,
- # ./sampleproc
- Sampling at 100 hertz... Hit Ctrl-C to end.
- ^C
- PID CMD COUNT
- 10140 dtrace 1
- 28286 java 1
- 29345 esd 2
- 29731 esd 3
- 2 pageout 4
- 29733 esd 6
- 10098 bash 1015
- 0 sched 3028
- PID CMD PERCENT
- 10140 dtrace 0
- 28286 java 0
- 29345 esd 0
- 29731 esd 0
- 2 pageout 0
- 29733 esd 0
- 10098 bash 24
- 0 sched 74
- The bash shell was on the CPUs for 24% of the time, which is consistant
- with a CPU bound single threaded application on a 4 CPU server.
- The above sample was around 10 seconds long. During this time, there were
- around 4000 samples (checking the COUNT column), this is due to
- 4000 = CPUs (4) * Hertz (100) * Seconds (10).
- The following are examples of seeksize.d.
- seeksize.d records disk head seek size for each operation by process.
- This allows up to identify processes that are causing "random" disk
- access and those causing "sequential" disk access.
- It is desirable for processes to be accesing the disks in large
- sequential operations. By using seeksize.d and bitesize.d we can
- identify this behaviour.
- In this example we read through a large file by copying it to a
- remote server. Most of the seek sizes are zero, indicating sequential
- access - and we would expect good performance from the disks
- under these conditions,
- # ./seeksize.d
- Tracing... Hit Ctrl-C to end.
- ^C
- 22349 scp /dl/sol-10-b63-x86-v1.iso mars:\0
- value ------------- Distribution ------------- count
- -1 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 726
- 1 | 0
- 2 | 0
- 4 | 0
- 8 |@ 13
- 16 | 4
- 32 | 0
- 64 | 0
- 128 | 2
- 256 | 3
- 512 | 4
- 1024 | 4
- 2048 | 3
- 4096 | 0
- 8192 | 3
- 16384 | 0
- 32768 | 1
- 65536 | 0
- In this example we run find. The disk operations are fairly scattered,
- as illustrated below by the volume of non sequential reads,
- # ./seeksize.d
- Tracing... Hit Ctrl-C to end.
- ^C
- 22399 find /var/sadm/pkg/\0
- value ------------- Distribution ------------- count
- -1 | 0
- 0 |@@@@@@@@@@@@@ 1475
- 1 | 0
- 2 | 44
- 4 |@ 77
- 8 |@@@ 286
- 16 |@@ 191
- 32 |@ 154
- 64 |@@ 173
- 128 |@@ 179
- 256 |@@ 201
- 512 |@@ 186
- 1024 |@@ 236
- 2048 |@@ 201
- 4096 |@@ 274
- 8192 |@@ 243
- 16384 |@ 154
- 32768 |@ 113
- 65536 |@@ 182
- 131072 |@ 81
- 262144 | 0
- I found the following interesting. This time I gzipp'd the large file.
- While zipping, the process is reading from one location and writing
- to another. One might expect that as the program toggles between
- reading from one location and writing to another, that often the
- distance would be the same (depending on where UFS puts the new file),
- # ./seeksize.d
- Tracing... Hit Ctrl-C to end.
- ^C
- 22368 gzip sol-10-b63-x86-v1.iso\0
- value ------------- Distribution ------------- count
- -1 | 0
- 0 |@@@@@@@@@@@@ 353
- 1 | 0
- 2 | 0
- 4 | 0
- 8 | 7
- 16 | 4
- 32 | 2
- 64 | 4
- 128 | 14
- 256 | 3
- 512 | 3
- 1024 | 5
- 2048 | 1
- 4096 | 0
- 8192 | 3
- 16384 | 1
- 32768 | 1
- 65536 | 1
- 131072 | 1
- 262144 |@@@@@@@@ 249
- 524288 | 1
- 1048576 | 2
- 2097152 | 1
- 4194304 | 2
- 8388608 |@@@@@@@@@@@@@@@@@@ 536
- 16777216 | 0
- The following example compares the operation of "find" with "tar".
- Both are reading from the same location, and we would expect that
- both programs would generally need to do the same number of seeks
- to navigate the direttory tree (depending on caching); and tar
- causing extra operations as it reads the file contents as well,
- # ./seeksize.d
- Tracing... Hit Ctrl-C to end.
- ^C
- PID CMD
- 22278 find /etc\0
- value ------------- Distribution ------------- count
- -1 | 0
- 0 |@@@@@@@@@@@@@@@@@@@@ 251
- 1 | 0
- 2 |@ 8
- 4 | 5
- 8 |@ 10
- 16 |@ 10
- 32 |@ 10
- 64 |@ 9
- 128 |@ 11
- 256 |@ 14
- 512 |@@ 20
- 1024 |@ 10
- 2048 | 6
- 4096 |@ 7
- 8192 |@ 10
- 16384 |@ 16
- 32768 |@@ 21
- 65536 |@@ 28
- 131072 |@ 7
- 262144 |@ 14
- 524288 | 6
- 1048576 |@ 15
- 2097152 |@ 7
- 4194304 | 0
- 22282 tar cf /dev/null /etc\0
- value ------------- Distribution ------------- count
- -1 | 0
- 0 |@@@@@@@@@@ 397
- 1 | 0
- 2 | 8
- 4 | 14
- 8 | 16
- 16 |@ 24
- 32 |@ 29
- 64 |@@ 99
- 128 |@@ 73
- 256 |@@ 78
- 512 |@@@ 109
- 1024 |@@ 62
- 2048 |@@ 69
- 4096 |@@ 73
- 8192 |@@@ 113
- 16384 |@@ 81
- 32768 |@@@ 111
- 65536 |@@@ 108
- 131072 |@ 49
- 262144 |@ 33
- 524288 | 20
- 1048576 | 13
- 2097152 | 7
- 4194304 | 5
- 8388608 |@ 30
- 16777216 | 0
- The following is an example of setuids.d. Login events in particular can
- be seen, along with use of the "su" command.
- # ./setuids.d
- UID SUID PPID PID PCMD CMD
- 0 100 3037 3040 in.telnetd login -p -h mars -d /dev/pts/12
- 100 0 3040 3045 bash su -
- 0 102 3045 3051 sh su - fred
- 0 100 3055 3059 sshd /usr/lib/ssh/sshd
- 0 100 3065 3067 in.rlogind login -d /dev/pts/12 -r mars
- 0 100 3071 3073 in.rlogind login -d /dev/pts/12 -r mars
- 0 102 3078 3081 in.telnetd login -p -h mars -d /dev/pts/12
- ^C
- The first line is a telnet login to the user brendan, UID 100. The parent
- command is "in.telnetd", the telnet daemon spawned by inetd, and the
- command that in.telnetd runs is "login".
- The second line shows UID 100 using the "su" command to become root.
- The third line has the root user using "su" to become fred, UID 102.
- The fourth line is an example of an ssh login.
- The fifth and sixth lines are examples of rsh and rlogin.
- The last line is another example of a telnet login for fred, UID 102.
- The following is a demonstration of the sigdist.d script.
- Here we run sigdist.d, and in another window we kill -9 a sleep process,
- # ./sigdist.d
- Tracing... Hit Ctrl-C to end.
- ^C
- SENDER RECIPIENT SIG COUNT
- sched dtrace 2 1
- sched bash 18 1
- bash sleep 9 1
- sched Xorg 14 55
- We can see the signal sent from bash to sleep. We can also see that Xorg
- has recieved 55 signal 14s. a "man -s3head signal" may help explain what
- signal 14 is (alarm clock).
- The following is a demonstration of the syscallbypid.d script,
- Here we run syscallbypid.d for a few seconds then hit Ctrl-C,
- # syscallbypid.d
- Tracing... Hit Ctrl-C to end.
- ^C
- PID CMD SYSCALL COUNT
- 11039 dtrace setcontext 1
- 11039 dtrace lwp_sigmask 1
- 7 svc.startd portfs 1
- 357 poold lwp_cond_wait 1
- 27328 java_vm lwp_cond_wait 1
- 1532 Xorg writev 1
- 11039 dtrace lwp_park 1
- 11039 dtrace schedctl 1
- 11039 dtrace mmap 1
- 361 sendmail pollsys 1
- 11039 dtrace fstat64 1
- 11039 dtrace sigaction 2
- 11039 dtrace write 2
- 361 sendmail lwp_sigmask 2
- 1659 mozilla-bin yield 2
- 11039 dtrace sysconfig 3
- 361 sendmail pset 3
- 20317 sshd read 4
- 361 sendmail gtime 4
- 20317 sshd write 4
- 27328 java_vm ioctl 6
- 11039 dtrace brk 8
- 1532 Xorg setcontext 8
- 1532 Xorg lwp_sigmask 8
- 20317 sshd pollsys 8
- 357 poold pollsys 13
- 1659 mozilla-bin read 16
- 20317 sshd lwp_sigmask 16
- 1532 Xorg setitimer 17
- 27328 java_vm pollsys 18
- 1532 Xorg pollsys 19
- 11039 dtrace p_online 21
- 1532 Xorg read 22
- 1659 mozilla-bin write 25
- 1659 mozilla-bin lwp_park 26
- 11039 dtrace ioctl 36
- 1659 mozilla-bin pollsys 155
- 1659 mozilla-bin ioctl 306
- In the above output, we can see that "mozilla-bin" with PID 1659 made the
- most system calls - 306 ioctl()s.
- The following is an example of the syscallbyproc.d script,
- # syscallbyproc.d
- dtrace: description 'syscall:::entry ' matched 228 probes
- ^C
- snmpd 1
- utmpd 2
- inetd 2
- nscd 7
- svc.startd 11
- sendmail 31
- poold 133
- dtrace 1720
- The above output shows that dtrace made the most system calls in this sample,
- 1720 syscalls.
- The following is a demonstration of the syscallbysysc.d script,
- # syscallbysysc.d
- dtrace: description 'syscall:::entry ' matched 228 probes
- ^C
- fstat 1
- setcontext 1
- lwp_park 1
- schedctl 1
- mmap 1
- sigaction 2
- pset 2
- lwp_sigmask 2
- gtime 3
- sysconfig 3
- write 4
- brk 6
- pollsys 7
- p_online 558
- ioctl 579
- In the above output, the ioctl system call was the most common, occuring
- 579 times.
- The following is a demonstration of the topsyscall command,
- Here topsyscall is run with no arguments,
- # topsyscall
- 2005 Jun 13 22:13:21, load average: 1.24, 1.24, 1.22 syscalls: 1287
- SYSCALL COUNT
- getgid 4
- getuid 5
- waitsys 5
- xstat 7
- munmap 7
- sysconfig 8
- brk 8
- setcontext 8
- open 8
- getpid 9
- close 9
- resolvepath 10
- lwp_sigmask 22
- mmap 26
- lwp_park 43
- read 59
- write 72
- sigaction 113
- pollsys 294
- ioctl 520
- The screen updates every second, and continues until Ctrl-C is hit to
- end the program.
- In the above output we can see that the ioctl() system call occured 520 times,
- pollsys() 294 times and sigaction() 113 times.
- Here the command is run with a 10 second interval,
- # topsyscall 10
- 2005 Jun 13 22:15:35, load average: 1.21, 1.22, 1.22 syscalls: 10189
- SYSCALL COUNT
- writev 6
- close 7
- lseek 7
- open 7
- brk 8
- nanosleep 9
- portfs 10
- llseek 14
- lwp_cond_wait 21
- p_online 21
- gtime 27
- rusagesys 71
- setcontext 92
- lwp_sigmask 98
- setitimer 183
- lwp_park 375
- write 438
- read 551
- pollsys 3071
- ioctl 5144
- The following is a demonstration of the topsysproc program,
- Here we run topsysproc with no arguments,
- # topsysproc
- 2005 Jun 13 22:25:16, load average: 1.24, 1.23, 1.21 syscalls: 1347
- PROCESS COUNT
- svc.startd 1
- nscd 1
- setiathome 7
- poold 18
- sshd 21
- java_vm 35
- tput 49
- dtrace 56
- Xorg 108
- sh 110
- clear 122
- mozilla-bin 819
- The screen refreshes every 1 second, which can be changed by specifying
- a different interval at the command line.
- In the above output we can see that processes with the name "mozilla-bin"
- made 819 system calls, while processes with the name "clear" made 122.
- Now topsysproc is run with a 15 second interval,
- # topsysproc 15
- 2005 Jun 13 22:29:43, load average: 1.19, 1.20, 1.20 syscalls: 15909
- PROCESS COUNT
- fmd 1
- inetd 2
- svc.configd 2
- gconfd-2 3
- miniserv.pl 3
- sac 6
- snmpd 6
- sshd 8
- automountd 8
- ttymon 9
- svc.startd 17
- nscd 21
- in.routed 37
- sendmail 41
- setiathome 205
- poold 293
- dtrace 413
- java_vm 529
- Xorg 1234
- mozilla-bin 13071
- bash-3.2$
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement