Advertisement
Guest User

Untitled

a guest
Aug 31st, 2018
158
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 120.27 KB | None | 0 0
  1. Last login: Fri Aug 31 09:37:55 on ttys015
  2. -bash-3.2$ man fts
  3. Warning: cannot open configuration file /private/etc/man.conf
  4. No manual entry for fts
  5. -bash-3.2$ man n fts
  6. Warning: cannot open configuration file /private/etc/man.conf
  7. No entry for fts in section n of the manual
  8. -bash-3.2$ man a fts
  9. Warning: cannot open configuration file /private/etc/man.conf
  10. No manual entry for a
  11. No manual entry for fts
  12. -bash-3.2$ man
  13. Warning: cannot open configuration file /private/etc/man.conf
  14. What manual page do you want?
  15. -bash-3.2$ bash
  16. bash-3.2$ ls
  17. Applications Documents Library Music bin projects sudo
  18. Desktop Downloads Movies Pictures dev script-dev workspace1
  19. bash-3.2$ man
  20. Warning: cannot open configuration file /private/etc/man.conf
  21. What manual page do you want?
  22. bash-3.2$ man a fts
  23. Warning: cannot open configuration file /private/etc/man.conf
  24. No manual entry for a
  25. No manual entry for fts
  26. bash-3.2$ man fts
  27. Warning: cannot open configuration file /private/etc/man.conf
  28. No manual entry for fts
  29. bash-3.2$ man 8 fts
  30. Warning: cannot open configuration file /private/etc/man.conf
  31. No entry for fts in section 8 of the manual
  32. bash-3.2$ manpages n fts
  33. bash: manpages: command not found
  34. bash-3.2$ man1 fts
  35. bash: man1: command not found
  36. bash-3.2$ cd /usr/share
  37. bash-3.2$ ls
  38. CSI emacs icu mecabra snmp vim
  39. CoreDuetDaemonConfig.bundle examples info misc tabset zoneinfo
  40. calendar file java php terminfo zoneinfo.default
  41. com.apple.languageassetd firmware kdrl.bundle pmenergy texinfo zsh
  42. cracklib germantok kpep ri thermald.bundle
  43. cups groff langid sandbox tokenizer
  44. dict hiutil locale screen ucupdate
  45. doc httpd man skel uucp
  46. bash-3.2$ cd examples
  47. bash-3.2$ ls
  48. DTTk
  49. bash-3.2$ cd D*
  50. bash-3.2$ ls
  51. bitesize_example.txt errinfo_example.txt iopattern_example.txt newproc_example.txt rwbypid_example.txt syscallbypid_example.txt
  52. cpuwalk_example.txt execsnoop_example.txt iopending_example.txt opensnoop_example.txt rwbytype_example.txt syscallbyproc_example.txt
  53. creatbyproc_example.txt fddist_example.txt iosnoop_example.txt pathopens_example.txt rwsnoop_example.txt syscallbysysc_example.txt
  54. dappprof_example.txt filebyproc_example.txt iotop_example.txt pidpersec_example.txt sampleproc_example.txt topsyscall_example.txt
  55. dapptrace_example.txt hotspot_example.txt kill_example.txt priclass_example.txt seeksize_example.txt topsysproc_example.txt
  56. dispqlen_example.txt iofile_example.txt lastwords_example.txt pridist_example.txt setuids_example.txt
  57. dtruss_example.txt iofileb_example.txt loads_example.txt procsystime_example.txt sigdist_example.txt
  58. bash-3.2$ cat *
  59. In this example, bitesize.d was run for several seconds then Ctrl-C was hit.
  60. As bitesize.d runs it records how processes on the system are accessing the
  61. disks - in particular the size of the I/O operation. It is usually desirable
  62. for processes to be requesting large I/O operations rather than taking many
  63. small "bites".
  64.  
  65. The final report highlights how processes performed. The find command mostly
  66. read 1K blocks while the tar command was reading large blocks - both as
  67. expected.
  68.  
  69. # bitesize.d
  70. Tracing... Hit Ctrl-C to end.
  71. ^C
  72.  
  73. PID CMD
  74. 7110 -bash\0
  75.  
  76. value ------------- Distribution ------------- count
  77. 512 | 0
  78. 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 2
  79. 2048 | 0
  80. 4096 |@@@@@@@@@@@@@ 1
  81. 8192 | 0
  82.  
  83. 7110 sync\0
  84.  
  85. value ------------- Distribution ------------- count
  86. 512 | 0
  87. 1024 |@@@@@ 1
  88. 2048 |@@@@@@@@@@ 2
  89. 4096 | 0
  90. 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@ 5
  91. 16384 | 0
  92.  
  93. 0 sched\0
  94.  
  95. value ------------- Distribution ------------- count
  96. 1024 | 0
  97. 2048 |@@@ 1
  98. 4096 | 0
  99. 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
  100. 16384 | 0
  101.  
  102. 7109 find /\0
  103.  
  104. value ------------- Distribution ------------- count
  105. 512 | 0
  106. 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1452
  107. 2048 |@@ 91
  108. 4096 | 33
  109. 8192 |@@ 97
  110. 16384 | 0
  111.  
  112. 3 fsflush\0
  113.  
  114. value ------------- Distribution ------------- count
  115. 4096 | 0
  116. 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 449
  117. 16384 | 0
  118.  
  119. 7108 tar cf /dev/null /\0
  120.  
  121. value ------------- Distribution ------------- count
  122. 256 | 0
  123. 512 | 70
  124. 1024 |@@@@@@@@@@ 1306
  125. 2048 |@@@@ 569
  126. 4096 |@@@@@@@@@ 1286
  127. 8192 |@@@@@@@@@@ 1403
  128. 16384 |@ 190
  129. 32768 |@@@ 396
  130. 65536 | 0
  131.  
  132.  
  133. The following is a demonstration of the cpuwalk.d script,
  134.  
  135.  
  136. cpuwalk.d is not that useful on a single CPU server,
  137.  
  138. # cpuwalk.d
  139. Sampling... Hit Ctrl-C to end.
  140. ^C
  141.  
  142. PID: 18843 CMD: bash
  143.  
  144. value ------------- Distribution ------------- count
  145. < 0 | 0
  146. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 30
  147. 1 | 0
  148.  
  149. PID: 8079 CMD: mozilla-bin
  150.  
  151. value ------------- Distribution ------------- count
  152. < 0 | 0
  153. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
  154. 1 | 0
  155.  
  156. The output above shows that PID 18843, "bash", was sampled on CPU 0 a total
  157. of 30 times (we sample at 1000 hz).
  158.  
  159.  
  160.  
  161. The following is a demonstration of running cpuwalk.d with a 5 second
  162. duration. This is on a 4 CPU server running a multithreaded CPU bound
  163. application called "cputhread",
  164.  
  165. # cpuwalk.d 5
  166. Sampling...
  167.  
  168. PID: 3 CMD: fsflush
  169.  
  170. value ------------- Distribution ------------- count
  171. 1 | 0
  172. 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 30
  173. 3 | 0
  174.  
  175. PID: 12186 CMD: cputhread
  176.  
  177. value ------------- Distribution ------------- count
  178. < 0 | 0
  179. 0 |@@@@@@@@@@ 4900
  180. 1 |@@@@@@@@@@ 4900
  181. 2 |@@@@@@@@@@ 4860
  182. 3 |@@@@@@@@@@ 4890
  183. 4 | 0
  184.  
  185. As we are sampling at 1000 hz, the application cputhread is indeed running
  186. concurrently across all available CPUs. We measured the applicaiton on
  187. CPU 0 a total of 4900 times, on CPU 1 a total of 4900 times, etc. As there
  188. are around 5000 samples per CPU available in this 5 second 1000 hz sample,
  189. the application is using almost all the CPU capacity in this server well.
  190.  
  191.  
  192.  
  193. The following is a similar demonstration, this time running a multithreaded
  194. CPU bound application called "cpuserial" that has a poor use of locking
  195. such that the threads "serialise",
  196.  
  197.  
  198. # cpuwalk.d 5
  199. Sampling...
  200.  
  201. PID: 12194 CMD: cpuserial
  202.  
  203. value ------------- Distribution ------------- count
  204. < 0 | 0
  205. 0 |@@@ 470
  206. 1 |@@@@@@ 920
  207. 2 |@@@@@@@@@@@@@@@@@@@@@@@@@ 3840
  208. 3 |@@@@@@ 850
  209. 4 | 0
  210.  
  211. In the above, we can see that this CPU bound application is not making
  212. efficient use of the CPU resources available, only reaching 3840 samples
  213. on CPU 2 out of a potential 5000. This problem was caused by a poor use
  214. of locks.
  215.  
  216.  
  217.  
  218. The following is an example of the creatbyproc.d script,
  219.  
  220.  
  221. Here we run creatbyproc.d for several seconds,
  222.  
  223. # ./creatbyproc.d
  224. dtrace: script './creatbyproc.d' matched 2 probes
  225. CPU ID FUNCTION:NAME
  226. 0 5438 creat64:entry touch /tmp/newfile
  227. 0 5438 creat64:entry sh /tmp/mpLaaOik
  228. 0 5438 creat64:entry sh /dev/null
  229. ^C
  230.  
  231. In another window, the following commands were run,
  232.  
  233. touch /tmp/newfile
  234. man ls
  235.  
  236. The file creation activity caused by these commands can be seen in the
  237. output by creatbyproc.d
  238.  
  239.  
  240.  
  241. The following is a demonstration of the dappprof command,
  242.  
  243. This is the usage for version 0.60,
  244.  
  245. # dappprof -h
  246. USAGE: dappprof [-cehoTU] [-u lib] { -p PID | command }
  247.  
  248. -p PID # examine this PID
  249. -a # print all details
  250. -c # print syscall counts
  251. -e # print elapsed times (us)
  252. -o # print on cpu times
  253. -T # print totals
  254. -u lib # trace this library instead
  255. -U # trace all libraries + user funcs
  256. -b bufsize # dynamic variable buf size
  257. eg,
  258. dappprof df -h # run and examine "df -h"
  259. dappprof -p 1871 # examine PID 1871
  260. dappprof -ap 1871 # print all data
  261.  
  262.  
  263.  
  264. The following shows running dappprof with the "banner hello" command.
  265. Elapsed and on-cpu times are printed (-eo), as well as counts (-c) and
  266. totals (-T),
  267.  
  268. # dappprof -eocT banner hello
  269.  
  270. # # ###### # # ####
  271. # # # # # # #
  272. ###### ##### # # # #
  273. # # # # # # #
  274. # # # # # # #
  275. # # ###### ###### ###### ####
  276.  
  277.  
  278. CALL COUNT
  279. __fsr 1
  280. main 1
  281. banprt 1
  282. banner 1
  283. banset 1
  284. convert 5
  285. banfil 5
  286. TOTAL: 15
  287.  
  288. CALL ELAPSED
  289. banset 37363
  290. banfil 147407
  291. convert 149606
  292. banprt 423507
  293. banner 891088
  294. __fsr 1694349
  295. TOTAL: 3343320
  296.  
  297. CALL CPU
  298. banset 7532
  299. convert 8805
  300. banfil 11092
  301. __fsr 15708
  302. banner 48696
  303. banprt 388853
  304. TOTAL: 480686
  305.  
  306. The above output has analysed user functions (the default). It makes it
  307. easy to identify which function is being called the most (COUNT), which
  308. is taking the most time (ELAPSED), and which is consuming the most CPU (CPU).
  309. These times are totals for all the functions called.
  310.  
  311.  
  312. The following is a demonstration of the dapptrace command,
  313.  
  314. This is the usage for version 0.60,
  315.  
  316. # dapptrace -h
  317. USAGE: dapptrace [-acdeholFLU] [-u lib] { -p PID | command }
  318.  
  319. -p PID # examine this PID
  320. -a # print all details
  321. -c # print syscall counts
  322. -d # print relative times (us)
  323. -e # print elapsed times (us)
  324. -F # print flow indentation
  325. -l # print pid/lwpid
  326. -o # print CPU on cpu times
  327. -u lib # trace this library instead
  328. -U # trace all libraries + user funcs
  329. -b bufsize # dynamic variable buf size
  330. eg,
  331. dapptrace df -h # run and examine "df -h"
  332. dapptrace -p 1871 # examine PID 1871
  333. dapptrace -Fp 1871 # print using flow indents
  334. dapptrace -eop 1871 # print elapsed and CPU times
  335.  
  336.  
  337.  
  338. The following is an example of the default output. We run dapptrace with
  339. the "banner hello" command,
  340.  
  341. # dapptrace banner hi
  342.  
  343. # # #
  344. # # #
  345. ###### #
  346. # # #
  347. # # #
  348. # # #
  349.  
  350. CALL(args) = return
  351. -> __fsr(0x2, 0x8047D7C, 0x8047D88)
  352. <- __fsr = 122
  353. -> main(0x2, 0x8047D7C, 0x8047D88)
  354. -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
  355. -> banset(0x20, 0x80614C2, 0x8047DCC)
  356. <- banset = 36
  357. -> convert(0x68, 0x8047DCC, 0x2)
  358. <- convert = 319
  359. -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
  360. <- banfil = 57
  361. -> convert(0x69, 0x8047DCC, 0x2)
  362. <- convert = 319
  363. -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
  364. <- banfil = 57
  365. <- banner = 118
  366. -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
  367. <- banprt = 74
  368.  
  369. The default output shows user function calls. An entry is prefixed
  370. with a "->", and the return has a "<-".
  371.  
  372.  
  373.  
  374. Here we run dapptrace with the -F for flow indent option,
  375.  
  376. # dapptrace -F banner hi
  377.  
  378. # # #
  379. # # #
  380. ###### #
  381. # # #
  382. # # #
  383. # # #
  384.  
  385. CALL(args) = return
  386. -> __fsr(0x2, 0x8047D7C, 0x8047D88)
  387. <- __fsr = 122
  388. -> main(0x2, 0x8047D7C, 0x8047D88)
  389. -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
  390. -> banset(0x20, 0x80614C2, 0x8047DCC)
  391. <- banset = 36
  392. -> convert(0x68, 0x8047DCC, 0x2)
  393. <- convert = 319
  394. -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
  395. <- banfil = 57
  396. -> convert(0x69, 0x8047DCC, 0x2)
  397. <- convert = 319
  398. -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
  399. <- banfil = 57
  400. <- banner = 118
  401. -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
  402. <- banprt = 74
  403.  
  404. The above output illustrates the flow of the program, which functions
  405. call which other functions.
  406.  
  407.  
  408.  
  409. Now the same command is run with -d to display relative timestamps,
  410.  
  411. # dapptrace -dF banner hi
  412.  
  413. # # #
  414. # # #
  415. ###### #
  416. # # #
  417. # # #
  418. # # #
  419.  
  420. RELATIVE CALL(args) = return
  421. 2512 -> __fsr(0x2, 0x8047D7C, 0x8047D88)
  422. 2516 <- __fsr = 122
  423. 2518 -> main(0x2, 0x8047D7C, 0x8047D88)
  424. 2863 -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
  425. 2865 -> banset(0x20, 0x80614C2, 0x8047DCC)
  426. 2872 <- banset = 36
  427. 2874 -> convert(0x68, 0x8047DCC, 0x2)
  428. 2877 <- convert = 319
  429. 2879 -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
  430. 2882 <- banfil = 57
  431. 2883 -> convert(0x69, 0x8047DCC, 0x2)
  432. 2885 <- convert = 319
  433. 2886 -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
  434. 2888 <- banfil = 57
  435. 2890 <- banner = 118
  436. 2892 -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
  437. 3214 <- banprt = 74
  438.  
  439. The relative times are in microseconds since the program's invocation. Great!
  440.  
  441.  
  442.  
  443. Even better is if we use the -eo options, to print elapsed times and on-cpu
  444. times,
  445.  
  446. # dapptrace -eoF banner hi
  447.  
  448. # # #
  449. # # #
  450. ###### #
  451. # # #
  452. # # #
  453. # # #
  454.  
  455. ELAPSD CPU CALL(args) = return
  456. . . -> __fsr(0x2, 0x8047D7C, 0x8047D88)
  457. 41 4 <- __fsr = 122
  458. . . -> main(0x2, 0x8047D7C, 0x8047D88)
  459. . . -> banner(0x8047E3B, 0x80614C2, 0x8047D38)
  460. . . -> banset(0x20, 0x80614C2, 0x8047DCC)
  461. 29 6 <- banset = 36
  462. . . -> convert(0x68, 0x8047DCC, 0x2)
  463. 26 3 <- convert = 319
  464. . . -> banfil(0x8061412, 0x80614C2, 0x8047DCC)
  465. 25 2 <- banfil = 57
  466. . . -> convert(0x69, 0x8047DCC, 0x2)
  467. 23 1 <- convert = 319
  468. . . -> banfil(0x8061419, 0x80614CA, 0x8047DCC)
  469. 23 1 <- banfil = 57
  470. 309 28 <- banner = 118
  471. . . -> banprt(0x80614C2, 0x8047D38, 0xD27FB824)
  472. 349 322 <- banprt = 74
  473.  
  474. Now it is easy to see which functions take the longest (elapsed), and
  475. which consume the most CPU cycles.
  476.  
  477.  
  478.  
  479. The following demonstrates the -U option, to trace all libraries,
  480.  
  481. # dapptrace -U banner hi
  482.  
  483. # # #
  484. # # #
  485. ###### #
  486. # # #
  487. # # #
  488. # # #
  489.  
  490. CALL(args) = return
  491. -> ld.so.1:_rt_boot(0x8047E34, 0x8047E3B, 0x0)
  492. -> ld.so.1:_setup(0x8047D38, 0x20AE4, 0x3)
  493. -> ld.so.1:setup(0x8047D88, 0x8047DCC, 0x0)
  494. -> ld.so.1:fmap_setup(0x0, 0xD27FB2E4, 0xD27FB824)
  495. <- ld.so.1:fmap_setup = 125
  496. -> ld.so.1:addfree(0xD27FD3C0, 0xC40, 0x0)
  497. <- ld.so.1:addfree = 65
  498. -> ld.so.1:security(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)
  499. <- ld.so.1:security = 142
  500. -> ld.so.1:readenv_user(0x8047D88, 0xD27FB204, 0xD27FB220)
  501. -> ld.so.1:ld_str_env(0x8047E3E, 0xD27FB204, 0xD27FB220)
  502. <- ld.so.1:ld_str_env = 389
  503. -> ld.so.1:ld_str_env(0x8047E45, 0xD27FB204, 0xD27FB220)
  504. <- ld.so.1:ld_str_env = 389
  505. -> ld.so.1:ld_str_env(0x8047E49, 0xD27FB204, 0xD27FB220)
  506. <- ld.so.1:ld_str_env = 389
  507. -> ld.so.1:ld_str_env(0x8047E50, 0xD27FB204, 0xD27FB220)
  508. -> ld.so.1:strncmp(0x8047E53, 0xD27F7BEB, 0x4)
  509. <- ld.so.1:strncmp = 113
  510. -> ld.so.1:rd_event(0xD27FB1F8, 0x3, 0x0)
  511. [...4486 lines deleted...]
  512. -> ld.so.1:_lwp_mutex_unlock(0xD27FD380, 0xD27FB824, 0x8047C04)
  513. <- ld.so.1:_lwp_mutex_unlock = 47
  514. <- ld.so.1:rt_mutex_unlock = 34
  515. -> ld.so.1:rt_bind_clear(0x1, 0xD279ECC0, 0xD27FDB2C)
  516. <- ld.so.1:rt_bind_clear = 34
  517. <- ld.so.1:leave = 210
  518. <- ld.so.1:elf_bndr = 803
  519. <- ld.so.1:elf_rtbndr = 35
  520.  
  521. The output was huge, around 4500 lines long. Function names are prefixed
  522. with their library name, eg "ld.so.1".
  523.  
  524. This full output should be used with caution, as it enables so many probes
  525. it could well be a burden on the system.
  526.  
  527. This is a demonstration of the dispqlen.d script,
  528.  
  529.  
  530. Here we run it on a single CPU desktop,
  531.  
  532. # dispqlen.d
  533. Sampling... Hit Ctrl-C to end.
  534. ^C
  535. CPU 0
  536. value ------------- Distribution ------------- count
  537. < 0 | 0
  538. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1790
  539. 1 |@@@ 160
  540. 2 | 10
  541. 3 | 0
  542.  
  543. The output shows the length of the dispatcher queue is mostly 0. This is
  544. evidence that the CPU is not very saturated. It does not indicate that the
  545. CPU is idle - as we are measuring the length of the queue, not what is
  546. on the CPU.
  547.  
  548.  
  549.  
  550. Here it is run on a multi CPU server,
  551.  
  552. # dispqlen.d
  553. Sampling... Hit Ctrl-C to end.
  554. ^C
  555. CPU 1
  556. value ------------- Distribution ------------- count
  557. < 0 | 0
  558. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1573
  559. 1 |@@@@@@@@@ 436
  560. 2 | 4
  561. 3 | 0
  562.  
  563. CPU 4
  564. value ------------- Distribution ------------- count
  565. < 0 | 0
  566. 0 |@@@@@@@@@@@@@@@@@@@@@@ 1100
  567. 1 |@@@@@@@@@@@@@@@@@@ 912
  568. 2 | 1
  569. 3 | 0
  570.  
  571. CPU 0
  572. value ------------- Distribution ------------- count
  573. < 0 | 0
  574. 0 |@@@@@@@@@@@@@@@@@ 846
  575. 1 |@@@@@@@@@@@@@@@@@@@@@@@ 1167
  576. 2 | 0
  577.  
  578. CPU 5
  579. value ------------- Distribution ------------- count
  580. < 0 | 0
  581. 0 |@@@@@@@@ 397
  582. 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1537
  583. 2 |@@ 79
  584. 3 | 0
  585.  
  586. The above output shows that threads are queueing up on CPU 5 much more than
  587. CPU 0.
  588.  
  589. The following demonstrates the dtruss command - a DTrace version of truss.
  590. This version is designed to be less intrusive and safer than running truss.
  591.  
  592. dtruss has many options. Here is the help for version 0.70,
  593.  
  594. USAGE: dtruss [-acdefholL] [-t syscall] { -p PID | -n name | command }
  595.  
  596. -p PID # examine this PID
  597. -n name # examine this process name
  598. -t syscall # examine this syscall only
  599. -a # print all details
  600. -c # print syscall counts
  601. -d # print relative times (us)
  602. -e # print elapsed times (us)
  603. -f # follow children
  604. -l # force printing pid/lwpid
  605. -o # print on cpu times
  606. -L # don't print pid/lwpid
  607. -b bufsize # dynamic variable buf size
  608. eg,
  609. dtruss df -h # run and examine "df -h"
  610. dtruss -p 1871 # examine PID 1871
  611. dtruss -n tar # examine all processes called "tar"
  612. dtruss -f test.sh # run test.sh and follow children
  613.  
  614.  
  615.  
  616. For example, here we dtruss any process with the name "ksh" - the Korn shell,
  617.  
  618. # dtruss -n ksh
  619. PID/LWP SYSCALL(args) = return
  620. 27547/1: llseek(0x3F, 0xE4E, 0x0) = 3662 0
  621. 27547/1: read(0x3F, "\0", 0x400) = 0 0
  622. 27547/1: llseek(0x3F, 0x0, 0x0) = 3662 0
  623. 27547/1: write(0x3F, "ls -l\n\0", 0x8) = 8 0
  624. 27547/1: fdsync(0x3F, 0x10, 0xFEC1D444) = 0 0
  625. 27547/1: lwp_sigmask(0x3, 0x20000, 0x0) = 0xFFBFFEFF 0
  626. 27547/1: stat64("/usr/bin/ls\0", 0x8047A00, 0xFEC1D444) = 0 0
  627. 27547/1: lwp_sigmask(0x3, 0x0, 0x0) = 0xFFBFFEFF 0
  628. [...]
  629.  
  630. The output for each system call does not yet evaluate as much as truss does.
  631.  
  632.  
  633.  
  634. In the following example, syscall elapsed and overhead times are measured.
  635. Elapsed times represent the time from syscall start to finish; overhead
  636. times measure the time spent on the CPU,
  637.  
  638. # dtruss -eon bash
  639. PID/LWP ELAPSD CPU SYSCALL(args) = return
  640. 3911/1: 41 26 write(0x2, "l\0", 0x1) = 1 0
  641. 3911/1: 1001579 43 read(0x0, "s\0", 0x1) = 1 0
  642. 3911/1: 38 26 write(0x2, "s\0", 0x1) = 1 0
  643. 3911/1: 1019129 43 read(0x0, " \001\0", 0x1) = 1 0
  644. 3911/1: 38 26 write(0x2, " \0", 0x1) = 1 0
  645. 3911/1: 998533 43 read(0x0, "-\0", 0x1) = 1 0
  646. 3911/1: 38 26 write(0x2, "-\001\0", 0x1) = 1 0
  647. 3911/1: 1094323 42 read(0x0, "l\0", 0x1) = 1 0
  648. 3911/1: 39 27 write(0x2, "l\001\0", 0x1) = 1 0
  649. 3911/1: 1210496 44 read(0x0, "\r\0", 0x1) = 1 0
  650. 3911/1: 40 28 write(0x2, "\n\001\0", 0x1) = 1 0
  651. 3911/1: 9 1 lwp_sigmask(0x3, 0x2, 0x0) = 0xFFBFFEFF 0
  652. 3911/1: 70 63 ioctl(0x0, 0x540F, 0x80F6D00) = 0 0
  653.  
  654. A bash command was in another window, where the "ls -l" command was being
  655. typed. The keystrokes can be seen above, along with the long elapsed times
  656. (keystroke delays), and short overhead times (as the bash process blocks
  657. on the read and leaves the CPU).
  658.  
  659.  
  660.  
  661. Now dtruss is put to the test. Here we truss a test program that runs several
  662. hundred smaller programs, which in turn generate thousands of system calls.
  663.  
  664. First, as a "control" we run the program without a truss or dtruss running,
  665.  
  666. # time ./test
  667. real 0m38.508s
  668. user 0m5.299s
  669. sys 0m25.668s
  670.  
  671. Now we try truss,
  672.  
  673. # time truss ./test 2> /dev/null
  674. real 0m41.281s
  675. user 0m0.558s
  676. sys 0m1.351s
  677.  
  678. Now we try dtruss,
  679.  
  680. # time dtruss ./test 2> /dev/null
  681. real 0m46.226s
  682. user 0m6.771s
  683. sys 0m31.703s
  684.  
  685. In the above test, truss slowed the program from 38 seconds to 41. dtruss
  686. slowed the program from 38 seconds to 46, slightly slower that truss...
  687.  
  688. Now we try follow mode "-f". The test program does run several hundred
  689. smaller programs, so now there are plenty more system calls to track,
  690.  
  691. # time truss -f ./test 2> /dev/null
  692. real 2m28.317s
  693. user 0m0.893s
  694. sys 0m3.527s
  695.  
  696. Now we try dtruss,
  697.  
  698. # time dtruss -f ./test 2> /dev/null
  699. real 0m56.179s
  700. user 0m10.040s
  701. sys 0m38.185s
  702.  
  703. Wow, the difference is huge! truss slows the program from 38 to 148 seconds;
  704. but dtruss has only slowed the program from 38 to 56 seconds.
  705.  
  706.  
  707.  
  708.  
  709. This is an example of the errinfo program, which prints details on syscall
  710. failures.
  711.  
  712. By default it "snoops" syscall failures and prints their details,
  713.  
  714. # ./errinfo
  715. EXEC SYSCALL ERR DESC
  716. wnck-applet read 11 Resource temporarily unavailable
  717. Xorg read 11 Resource temporarily unavailable
  718. nautilus read 11 Resource temporarily unavailable
  719. Xorg read 11 Resource temporarily unavailable
  720. dsdm read 11 Resource temporarily unavailable
  721. Xorg read 11 Resource temporarily unavailable
  722. Xorg pollsys 4 interrupted system call
  723. mozilla-bin lwp_park 62 timer expired
  724. gnome-netstatus- ioctl 12 Not enough core
  725. mozilla-bin lwp_park 62 timer expired
  726. Xorg read 11 Resource temporarily unavailable
  727. mozilla-bin lwp_park 62 timer expired
  728. [...]
  729.  
  730. which is useful to see these events live, but can scroll off the screen
  731. somewhat rapidly.. so,
  732.  
  733.  
  734.  
  735. The "-c" option will count the number of errors. Hit Ctrl-C to stop the
  736. sample. For example,
  737.  
  738. # ./errinfo -c
  739. Tracing... Hit Ctrl-C to end.
  740. ^C
  741. EXEC SYSCALL ERR COUNT DESC
  742. nscd fcntl 22 1 Invalid argument
  743. xscreensaver read 11 1 Resource temporarily unavailable
  744. inetd lwp_park 62 1 timer expired
  745. svc.startd lwp_park 62 1 timer expired
  746. svc.configd lwp_park 62 1 timer expired
  747. ttymon ioctl 25 1 Inappropriate ioctl for device
  748. gnome-netstatus- ioctl 12 2 Not enough core
  749. mozilla-bin lwp_kill 3 2 No such process
  750. mozilla-bin connect 150 5 operation now in progress
  751. svc.startd portfs 62 8 timer expired
  752. java_vm lwp_cond_wait 62 8 timer expired
  753. soffice.bin read 11 9 Resource temporarily unavailable
  754. gnome-terminal read 11 23 Resource temporarily unavailable
  755. mozilla-bin recv 11 26 Resource temporarily unavailable
  756. nautilus read 11 26 Resource temporarily unavailable
  757. gnome-settings-d read 11 26 Resource temporarily unavailable
  758. gnome-smproxy read 11 34 Resource temporarily unavailable
  759. gnome-panel read 11 42 Resource temporarily unavailable
  760. dsdm read 11 112 Resource temporarily unavailable
  761. metacity read 11 128 Resource temporarily unavailable
  762. mozilla-bin lwp_park 62 133 timer expired
  763. Xorg pollsys 4 147 interrupted system call
  764. wnck-applet read 11 179 Resource temporarily unavailable
  765. mozilla-bin read 11 258 Resource temporarily unavailable
  766. Xorg read 11 1707 Resource temporarily unavailable
  767.  
  768. Ok, so Xorg has received 1707 of the same type of error for the syscall read().
  769.  
  770.  
  771.  
  772. The "-n" option lets us match on one type of process only. In the following
  773. we match processes that have the name "mozilla-bin",
  774.  
  775. # ./errinfo -c -n mozilla-bin
  776. Tracing... Hit Ctrl-C to end.
  777. ^C
  778. EXEC SYSCALL ERR COUNT DESC
  779. mozilla-bin getpeername 134 1 Socket is not connected
  780. mozilla-bin recv 11 2 Resource temporarily unavailable
  781. mozilla-bin lwp_kill 3 2 No such process
  782. mozilla-bin connect 150 5 operation now in progress
  783. mozilla-bin lwp_park 62 207 timer expired
  784. mozilla-bin read 11 396 Resource temporarily unavailable
  785.  
  786.  
  787.  
  788. The "-p" option lets us examine one PID only. The following example examines
  789. PID 1119,
  790.  
  791. # ./errinfo -c -p 1119
  792. Tracing... Hit Ctrl-C to end.
  793. ^C
  794. EXEC SYSCALL ERR COUNT DESC
  795. Xorg pollsys 4 47 interrupted system call
  796. Xorg read 11 669 Resource temporarily unavailable
  797.  
  798.  
  799. The following is an example of execsnoop. As processes are executed their
  800. details are printed out. Another user was logged in running a few commands
  801. which can be viewed below,
  802.  
  803. # ./execsnoop
  804. UID PID PPID ARGS
  805. 100 3008 2656 ls
  806. 100 3009 2656 ls -l
  807. 100 3010 2656 cat /etc/passwd
  808. 100 3011 2656 vi /etc/hosts
  809. 100 3012 2656 date
  810. 100 3013 2656 ls -l
  811. 100 3014 2656 ls
  812. 100 3015 2656 finger
  813. [...]
  814.  
  815.  
  816.  
  817. In this example the command "man gzip" was executed. The output lets us
  818. see what the man command is actually doing,
  819.  
  820. # ./execsnoop
  821. UID PID PPID ARGS
  822. 100 3064 2656 man gzip
  823. 100 3065 3064 sh -c cd /usr/share/man; tbl /usr/share/man/man1/gzip.1 |nroff -u0 -Tlp -man -
  824. 100 3067 3066 tbl /usr/share/man/man1/gzip.1
  825. 100 3068 3066 nroff -u0 -Tlp -man -
  826. 100 3066 3065 col -x
  827. 100 3069 3064 sh -c trap '' 1 15; /usr/bin/mv -f /tmp/mpoMaa_f /usr/share/man/cat1/gzip.1 2>
  828. 100 3070 3069 /usr/bin/mv -f /tmp/mpoMaa_f /usr/share/man/cat1/gzip.1
  829. 100 3071 3064 sh -c more -s /tmp/mpoMaa_f
  830. 100 3072 3071 more -s /tmp/mpoMaa_f
  831. ^C
  832.  
  833.  
  834.  
  835. Execsnoop has other options,
  836.  
  837. # ./execsnoop -h
  838. USAGE: execsnoop [-a|-A|-sv] [-c command]
  839. execsnoop # default output
  840. -a # print all data
  841. -A # dump all data, space delimited
  842. -s # include start time, us
  843. -v # include start time, string
  844. -c command # command name to snoop
  845.  
  846.  
  847.  
  848. In particular the verbose option for human readable timestamps is
  849. very useful,
  850.  
  851. # ./execsnoop -v
  852. STRTIME UID PID PPID ARGS
  853. 2005 Jan 22 00:07:22 0 23053 20933 date
  854. 2005 Jan 22 00:07:24 0 23054 20933 uname -a
  855. 2005 Jan 22 00:07:25 0 23055 20933 ls -latr
  856. 2005 Jan 22 00:07:27 0 23056 20933 df -k
  857. 2005 Jan 22 00:07:29 0 23057 20933 ps -ef
  858. 2005 Jan 22 00:07:29 0 23057 20933 ps -ef
  859. 2005 Jan 22 00:07:34 0 23058 20933 uptime
  860. 2005 Jan 22 00:07:34 0 23058 20933 uptime
  861. [...]
  862.  
  863.  
  864.  
  865. It is also possible to match particular commands. Here we watch
  866. anyone using the vi command only,
  867.  
  868. # ./execsnoop -vc vi
  869. STRTIME UID PID PPID ARGS
  870. 2005 Jan 22 00:10:33 0 23063 20933 vi /etc/passwd
  871. 2005 Jan 22 00:10:40 0 23064 20933 vi /etc/shadow
  872. 2005 Jan 22 00:10:51 0 23065 20933 vi /etc/group
  873. 2005 Jan 22 00:10:57 0 23066 20933 vi /.rhosts
  874. [...]
  875.  
  876.  
  877. The following is a demonstration of the fddist command,
  878.  
  879.  
  880. Here fddist is run for a few seconds on an idle workstation,
  881.  
  882. Tracing reads and writes... Hit Ctrl-C to end.
  883. ^C
  884. EXEC: dtrace PID: 3288
  885.  
  886. value ------------- Distribution ------------- count
  887. 0 | 0
  888. 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2
  889. 2 | 0
  890.  
  891. EXEC: mozilla-bin PID: 1659
  892.  
  893. value ------------- Distribution ------------- count
  894. 3 | 0
  895. 4 |@@@@@@@@@@ 28
  896. 5 | 0
  897. 6 |@@@@@@@@@@@@@@@ 40
  898. 7 |@@@@@@@@@@@@@@@ 40
  899. 8 | 0
  900.  
  901. EXEC: Xorg PID: 1532
  902.  
  903. value ------------- Distribution ------------- count
  904. 22 | 0
  905. 23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 57
  906. 24 | 0
  907.  
  908. The above displays the usage pattern for process file descriptors.
  909. We can see the Xorg process (PID 1532) has made 57 reads or writes to
  910. it's file descriptor 23.
  911.  
  912. The pfiles(1) command can be used to help determine what file
  913. descriptor 23 actually is.
  914.  
  915. The following is an example of the filebyproc.d script,
  916.  
  917. # filebyproc.d
  918. dtrace: description 'syscall::open*:entry ' matched 2 probes
  919. CPU ID FUNCTION:NAME
  920. 0 14 open:entry gnome-netstatus- /dev/kstat
  921. 0 14 open:entry man /var/ld/ld.config
  922. 0 14 open:entry man /lib/libc.so.1
  923. 0 14 open:entry man /usr/share/man/man.cf
  924. 0 14 open:entry man /usr/share/man/windex
  925. 0 14 open:entry man /usr/share/man/man1/ls.1
  926. 0 14 open:entry man /usr/share/man/man1/ls.1
  927. 0 14 open:entry man /tmp/mpqea4RF
  928. 0 14 open:entry sh /var/ld/ld.config
  929. 0 14 open:entry sh /lib/libc.so.1
  930. 0 14 open:entry neqn /var/ld/ld.config
  931. 0 14 open:entry neqn /lib/libc.so.1
  932. 0 14 open:entry neqn /usr/share/lib/pub/eqnchar
  933. 0 14 open:entry tbl /var/ld/ld.config
  934. 0 14 open:entry tbl /lib/libc.so.1
  935. 0 14 open:entry tbl /usr/share/man/man1/ls.1
  936. 0 14 open:entry nroff /var/ld/ld.config
  937. [...]
  938.  
  939. In the above example, the command "man ls" was run. Each file that was
  940. attempted to be opened can be seen, along with the program name responsible.
  941.  
  942. The following is a demonstration of the hotspot.d script.
  943.  
  944. Here the script is run while a large file is copied from one filesystem
  945. (cmdk0 102,0) to another (cmdk0 102,3). We can see the file mostly resided
  946. around the 9000 to 10999 Mb range on the source disk (102,0), and was
  947. copied to the 0 to 999 Mb range on the target disk (102,3).
  948.  
  949. # ./hotspot.d
  950. Tracing... Hit Ctrl-C to end.
  951. ^C
  952. Disk: cmdk0 Major,Minor: 102,3
  953.  
  954. value ------------- Distribution ------------- count
  955. < 0 | 0
  956. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 418
  957. 1000 | 0
  958.  
  959. Disk: cmdk0 Major,Minor: 102,0
  960.  
  961. value ------------- Distribution ------------- count
  962. < 0 | 0
  963. 0 | 1
  964. 1000 | 5
  965. 2000 | 0
  966. 3000 | 0
  967. 4000 | 0
  968. 5000 | 0
  969. 6000 | 0
  970. 7000 | 0
  971. 8000 | 0
  972. 9000 |@@@@@ 171
  973. 10000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1157
  974. 11000 | 0
  975.  
  976. The following is a demonstration of the iofile.d script,
  977.  
  978.  
  979. Here we run it while a tar command is backing up /var/adm,
  980.  
  981. # iofile.d
  982. Tracing... Hit Ctrl-C to end.
  983. ^C
  984. PID CMD TIME FILE
  985. 5206 tar 109 /var/adm/acct/nite
  986. 5206 tar 110 /var/adm/acct/sum
  987. 5206 tar 114 /var/adm/acct/fiscal
  988. 5206 tar 117 /var/adm/messages.3
  989. 5206 tar 172 /var/adm/sa
  990. 5206 tar 3605 /var/adm/messages.2
  991. 5206 tar 4548 /var/adm/spellhist
  992. 5206 tar 5769 /var/adm/exacct/brendan1task
  993. 5206 tar 6416 /var/adm/acct
  994. 5206 tar 7587 /var/adm/messages.1
  995. 5206 tar 8246 /var/adm/exacct/task
  996. 5206 tar 8320 /var/adm/pool
  997. 5206 tar 8973 /var/adm/pool/history
  998. 5206 tar 9183 /var/adm/exacct
  999. 3 fsflush 10882 <none>
  1000. 5206 tar 11861 /var/adm/exacct/flow
  1001. 5206 tar 12042 /var/adm/messages.0
  1002. 5206 tar 12408 /var/adm/sm.bin
  1003. 5206 tar 13021 /var/adm/sulog
  1004. 5206 tar 19007 /var/adm/streams
  1005. 5206 tar 21811 <none>
  1006. 5206 tar 24918 /var/adm/exacct/proc
  1007.  
  1008. In the above output, we can see that the tar command spent 24918 us (25 ms)
  1009. waiting for disk I/O on the /var/adm/exacct/proc file.
  1010.  
  1011. The following is a demonstration of the iofileb.d script,
  1012.  
  1013.  
  1014. Here we run it while a tar command is backing up /var/adm,
  1015.  
  1016. # ./iofileb.d
  1017. Tracing... Hit Ctrl-C to end.
  1018. ^C
  1019. PID CMD KB FILE
  1020. 29529 tar 56 /var/adm/sa/sa31
  1021. 29529 tar 56 /var/adm/sa/sa03
  1022. 29529 tar 56 /var/adm/sa/sa02
  1023. 29529 tar 56 /var/adm/sa/sa01
  1024. 29529 tar 56 /var/adm/sa/sa04
  1025. 29529 tar 56 /var/adm/sa/sa27
  1026. 29529 tar 56 /var/adm/sa/sa28
  1027. 29529 tar 324 /var/adm/exacct/task
  1028. 29529 tar 736 /var/adm/wtmpx
  1029.  
  1030. In the above output, we can see that the tar command has caused 736 Kbytes
  1031. of the /var/adm/wtmpx file to be read from disk. All af the Kbyte values
  1032. measured are for disk activity.
  1033.  
  1034. The following is a demonstration of the iopattern program,
  1035.  
  1036.  
  1037. Here we run iopattern for a few seconds then hit Ctrl-C. There is a "dd"
  1038. command running on this system to intentionally create heavy sequential
  1039. disk activity,
  1040.  
  1041. # iopattern
  1042. %RAN %SEQ COUNT MIN MAX AVG KR KW
  1043. 1 99 465 4096 57344 52992 23916 148
  1044. 0 100 556 57344 57344 57344 31136 0
  1045. 0 100 634 57344 57344 57344 35504 0
  1046. 6 94 554 512 57344 54034 29184 49
  1047. 0 100 489 57344 57344 57344 27384 0
  1048. 21 79 568 4096 57344 46188 25576 44
  1049. 4 96 431 4096 57344 56118 23620 0
  1050. ^C
  1051.  
  1052. In the above output we can see that the disk activity is mostly sequential.
  1053. The disks are also pulling around 30 Mb during each sample, with a large
  1054. average event size.
  1055.  
  1056.  
  1057.  
  1058. The following demonstrates iopattern while running a "find" command to
  1059. cause random disk activity,
  1060.  
  1061. # iopattern
  1062. %RAN %SEQ COUNT MIN MAX AVG KR KW
  1063. 86 14 400 1024 8192 1543 603 0
  1064. 81 19 455 1024 8192 1606 714 0
  1065. 89 11 469 512 8192 1854 550 299
  1066. 83 17 463 1024 8192 1782 806 0
  1067. 87 13 394 1024 8192 1551 597 0
  1068. 85 15 348 512 57344 2835 808 155
  1069. 91 9 513 512 47616 2812 570 839
  1070. 76 24 317 512 35840 3755 562 600
  1071. ^C
  1072.  
  1073. In the above output, we can see from the percentages that the disk events
  1074. were mostly random. We can also see that the average event size is small -
  1075. which makes sense if we are reading through many directory files.
  1076.  
  1077.  
  1078.  
  1079. iopattern has options. Here we print timestamps "-v" and measure every 10
  1080. seconds,
  1081.  
  1082. # iopattern -v 10
  1083. TIME %RAN %SEQ COUNT MIN MAX AVG KR KW
  1084. 2005 Jul 25 20:40:55 97 3 33 512 8192 1163 8 29
  1085. 2005 Jul 25 20:41:05 0 0 0 0 0 0 0 0
  1086. 2005 Jul 25 20:41:15 84 16 6 512 11776 5973 22 13
  1087. 2005 Jul 25 20:41:25 100 0 26 512 8192 1496 8 30
  1088. 2005 Jul 25 20:41:35 0 0 0 0 0 0 0 0
  1089. ^C
  1090.  
  1091. The following is a demonstration of the iopending tool,
  1092.  
  1093. Here we run it with a sample interval of 1 second,
  1094.  
  1095. # iopending 1
  1096. Tracing... Please wait.
  1097. 2006 Jan 6 20:21:59, load: 0.02, disk_r: 0 KB, disk_w: 0 KB
  1098.  
  1099. value ------------- Distribution ------------- count
  1100. < 0 | 0
  1101. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1010
  1102. 1 | 0
  1103.  
  1104. 2006 Jan 6 20:22:00, load: 0.03, disk_r: 0 KB, disk_w: 0 KB
  1105.  
  1106. value ------------- Distribution ------------- count
  1107. < 0 | 0
  1108. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1000
  1109. 1 | 0
  1110.  
  1111. 2006 Jan 6 20:22:01, load: 0.03, disk_r: 0 KB, disk_w: 0 KB
  1112.  
  1113. value ------------- Distribution ------------- count
  1114. < 0 | 0
  1115. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1000
  1116. 1 | 0
  1117.  
  1118. ^C
  1119.  
  1120. The iopending tool samples at 1000 Hz, and prints a distribution of how many
  1121. disk events were "pending" completion. In the above example the disks are
  1122. quiet - for all the samples there are zero disk events pending.
  1123.  
  1124.  
  1125.  
  1126. Now iopending is run with no arguments. It will default to an interval of 5
  1127. seconds,
  1128.  
  1129. # iopending
  1130. Tracing... Please wait.
  1131. 2006 Jan 6 19:15:41, load: 0.03, disk_r: 3599 KB, disk_w: 0 KB
  1132.  
  1133. value ------------- Distribution ------------- count
  1134. < 0 | 0
  1135. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4450
  1136. 1 |@@@ 390
  1137. 2 |@ 80
  1138. 3 | 40
  1139. 4 | 20
  1140. 5 | 30
  1141. 6 | 0
  1142.  
  1143. ^C
  1144.  
  1145. In the above output there was a little disk activity. For 390 samples there
  1146. was 1 I/O event pending; for 80 samples there was 2, and so on.
  1147.  
  1148.  
  1149.  
  1150.  
  1151. In the following example iopending is run during heavy disk activity. We
  1152. print output every 10 seconds,
  1153.  
  1154. # iopending 10
  1155. Tracing... Please wait.
  1156. 2006 Jan 6 20:58:07, load: 0.03, disk_r: 25172 KB, disk_w: 33321 KB
  1157.  
  1158. value ------------- Distribution ------------- count
  1159. < 0 | 0
  1160. 0 |@@@@@@@@@ 2160
  1161. 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6720
  1162. 2 |@@@@ 1000
  1163. 3 | 50
  1164. 4 | 30
  1165. 5 | 20
  1166. 6 | 10
  1167. 7 | 10
  1168. 8 | 10
  1169. 9 | 0
  1170.  
  1171. 2006 Jan 6 20:58:17, load: 0.05, disk_r: 8409 KB, disk_w: 12449 KB
  1172.  
  1173. value ------------- Distribution ------------- count
  1174. < 0 | 0
  1175. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7260
  1176. 1 |@@@@@@@ 1700
  1177. 2 |@ 300
  1178. 3 | 0
  1179. 4 | 10
  1180. 5 | 10
  1181. 6 | 10
  1182. 7 | 20
  1183. 8 | 0
  1184. 9 | 0
  1185. 10 | 0
  1186. 11 | 0
  1187. 12 | 0
  1188. 13 | 0
  1189. 14 | 0
  1190. 15 | 0
  1191. 16 | 0
  1192. 17 | 10
  1193. 18 | 20
  1194. 19 | 0
  1195. 20 | 0
  1196. 21 | 0
  1197. 22 | 0
  1198. 23 | 0
  1199. 24 | 0
  1200. 25 | 0
  1201. 26 | 0
  1202. 27 | 0
  1203. 28 | 0
  1204. 29 | 0
  1205. 30 | 0
  1206. 31 | 10
  1207. >= 32 |@@@ 650
  1208.  
  1209. ^C
  1210.  
  1211. In the first output, most of the time (67%) there was 1 event pending,
  1212. and for a short time there were 8 events pending. In the second output we
  1213. see many samples were off the scale - 650 samples at 32 or more pending
  1214. events. For this sample I had typed "sync" in another window, which
  1215. queued many disk events immediately which were eventually completed.
  1216.  
  1217. The following demonstrates iosnoop. It was run on a system that was
  1218. fairly quiet until a tar command was run,
  1219.  
  1220. # ./iosnoop
  1221. UID PID D BLOCK SIZE COMM PATHNAME
  1222. 0 0 W 1067 512 sched <none>
  1223. 0 0 W 6496304 1024 sched <none>
  1224. 0 3 W 6498797 512 fsflush <none>
  1225. 0 0 W 1067 512 sched <none>
  1226. 0 0 W 6496304 1024 sched <none>
  1227. 100 443 R 892288 4096 Xsun /usr/openwin/bin/Xsun
  1228. 100 443 R 891456 4096 Xsun /usr/openwin/bin/Xsun
  1229. 100 15795 R 3808 8192 tar /usr/bin/eject
  1230. 100 15795 R 35904 6144 tar /usr/bin/eject
  1231. 100 15795 R 39828 6144 tar /usr/bin/env
  1232. 100 15795 R 3872 8192 tar /usr/bin/expr
  1233. 100 15795 R 21120 7168 tar /usr/bin/expr
  1234. 100 15795 R 43680 6144 tar /usr/bin/false
  1235. 100 15795 R 44176 6144 tar /usr/bin/fdetach
  1236. 100 15795 R 3920 8192 tar /usr/bin/fdformat
  1237. 100 15795 R 3936 8192 tar /usr/bin/fdformat
  1238. 100 15795 R 4080 8192 tar /usr/bin/fdformat
  1239. 100 15795 R 9680 3072 tar /usr/bin/fdformat
  1240. 100 15795 R 4096 8192 tar /usr/bin/fgrep
  1241. 100 15795 R 46896 6144 tar /usr/bin/fgrep
  1242. 100 15795 R 4112 8192 tar /usr/bin/file
  1243. 100 15795 R 4128 8192 tar /usr/bin/file
  1244. 100 15795 R 4144 8192 tar /usr/bin/file
  1245. 100 15795 R 21552 7168 tar /usr/bin/file
  1246. 100 15795 R 4192 8192 tar /usr/bin/fmli
  1247. 100 15795 R 4208 8192 tar /usr/bin/fmli
  1248. 100 15795 R 4224 57344 tar /usr/bin/fmli
  1249. 100 15795 R 4336 24576 tar /usr/bin/fmli
  1250. 100 15795 R 695792 8192 tar <none>
  1251. 100 15795 R 696432 57344 tar /usr/bin/fmli
  1252. [...]
  1253.  
  1254.  
  1255.  
  1256. The following are demonstrations of the iotop program,
  1257.  
  1258.  
  1259. Here we run iotop with the -C option to not clear the screen, but instead
  1260. provide a scrolling output,
  1261.  
  1262. # iotop -C
  1263. Tracing... Please wait.
  1264. 2005 Jul 16 00:34:40, load: 1.21, disk_r: 12891 KB, disk_w: 1087 KB
  1265.  
  1266. UID PID PPID CMD DEVICE MAJ MIN D BYTES
  1267. 0 3 0 fsflush cmdk0 102 4 W 512
  1268. 0 3 0 fsflush cmdk0 102 0 W 11776
  1269. 0 27751 20320 tar cmdk0 102 16 W 23040
  1270. 0 3 0 fsflush cmdk0 102 0 R 73728
  1271. 0 0 0 sched cmdk0 102 0 R 548864
  1272. 0 0 0 sched cmdk0 102 0 W 1078272
  1273. 0 27751 20320 tar cmdk0 102 16 R 1514496
  1274. 0 27751 20320 tar cmdk0 102 3 R 11767808
  1275.  
  1276. 2005 Jul 16 00:34:45, load: 1.23, disk_r: 83849 KB, disk_w: 488 KB
  1277.  
  1278. UID PID PPID CMD DEVICE MAJ MIN D BYTES
  1279. 0 0 0 sched cmdk0 102 4 W 1536
  1280. 0 0 0 sched cmdk0 102 0 R 131072
  1281. 0 27752 20320 find cmdk0 102 0 R 262144
  1282. 0 0 0 sched cmdk0 102 0 W 498176
  1283. 0 27751 20320 tar cmdk0 102 3 R 11780096
  1284. 0 27751 20320 tar cmdk0 102 5 R 29745152
  1285. 0 27751 20320 tar cmdk0 102 4 R 47203328
  1286.  
  1287. 2005 Jul 16 00:34:50, load: 1.25, disk_r: 22394 KB, disk_w: 2 KB
  1288.  
  1289. UID PID PPID CMD DEVICE MAJ MIN D BYTES
  1290. 0 27752 20320 find cmdk0 102 0 W 2048
  1291. 0 0 0 sched cmdk0 102 0 R 16384
  1292. 0 321 1 automountd cmdk0 102 0 R 22528
  1293. 0 27752 20320 find cmdk0 102 0 R 1462272
  1294. 0 27751 20320 tar cmdk0 102 5 R 17465344
  1295.  
  1296. In the above output, we can see a tar command is reading from the cmdk0
  1297. disk, from several different slices (different minor numbers), on the last
  1298. report focusing on 102,5 (an "ls -lL" in /dev/dsk can explain the number to
  1299. slice mappings).
  1300.  
  1301. The disk_r and disk_w values give a summary of the overall activity in
  1302. bytes.
  1303.  
  1304.  
  1305.  
  1306. Bytes can be used as a yardstick to determine which process is keeping the
  1307. disks busy, however either of the delta times available from iotop would
  1308. be more accurate (as they take into account whether the activity is random
  1309. or sequential).
  1310.  
  1311. # iotop -Co
  1312. Tracing... Please wait.
  1313. 2005 Jul 16 00:39:03, load: 1.10, disk_r: 5302 KB, disk_w: 20 KB
  1314.  
  1315. UID PID PPID CMD DEVICE MAJ MIN D DISKTIME
  1316. 0 0 0 sched cmdk0 102 0 W 532
  1317. 0 0 0 sched cmdk0 102 0 R 245398
  1318. 0 27758 20320 find cmdk0 102 0 R 3094794
  1319.  
  1320. 2005 Jul 16 00:39:08, load: 1.14, disk_r: 5268 KB, disk_w: 273 KB
  1321.  
  1322. UID PID PPID CMD DEVICE MAJ MIN D DISKTIME
  1323. 0 3 0 fsflush cmdk0 102 0 W 2834
  1324. 0 0 0 sched cmdk0 102 0 W 263527
  1325. 0 0 0 sched cmdk0 102 0 R 285015
  1326. 0 3 0 fsflush cmdk0 102 0 R 519187
  1327. 0 27758 20320 find cmdk0 102 0 R 2429232
  1328.  
  1329. 2005 Jul 16 00:39:13, load: 1.16, disk_r: 602 KB, disk_w: 1238 KB
  1330.  
  1331. UID PID PPID CMD DEVICE MAJ MIN D DISKTIME
  1332. 0 3 0 fsflush cmdk0 102 4 W 200
  1333. 0 3 0 fsflush cmdk0 102 6 W 260
  1334. 0 3 0 fsflush cmdk0 102 0 W 883
  1335. 0 27758 20320 find cmdk0 102 0 R 55686
  1336. 0 3 0 fsflush cmdk0 102 0 R 317508
  1337. 0 0 0 sched cmdk0 102 0 R 320195
  1338. 0 0 0 sched cmdk0 102 0 W 571084
  1339. [...]
  1340.  
  1341. The disk time is in microseconds. In the first sample, we can see the find
  1342. command caused a total of 3.094 seconds of disk time - the duration of the
  1343. samples here is 5 seconds (the default), so it would be fair to say that
  1344. the find command is keeping the disk 60% busy.
  1345.  
  1346.  
  1347.  
  1348. A new option for iotop is to print percents "-P" which are based on disk
  1349. I/O times, and hense are a fair measurementt of what is keeping the disks
  1350. busy.
  1351.  
  1352. # iotop -PC 1
  1353. Tracing... Please wait.
  1354. 2005 Nov 18 15:26:14, load: 0.24, disk_r: 13176 KB, disk_w: 0 KB
  1355.  
  1356. UID PID PPID CMD DEVICE MAJ MIN D %I/O
  1357. 0 2215 1663 bart cmdk0 102 0 R 85
  1358.  
  1359. 2005 Nov 18 15:26:15, load: 0.25, disk_r: 5263 KB, disk_w: 0 KB
  1360.  
  1361. UID PID PPID CMD DEVICE MAJ MIN D %I/O
  1362. 0 2214 1663 find cmdk0 102 0 R 15
  1363. 0 2215 1663 bart cmdk0 102 0 R 67
  1364.  
  1365. 2005 Nov 18 15:26:16, load: 0.25, disk_r: 8724 KB, disk_w: 0 KB
  1366.  
  1367. UID PID PPID CMD DEVICE MAJ MIN D %I/O
  1368. 0 2214 1663 find cmdk0 102 0 R 10
  1369. 0 2215 1663 bart cmdk0 102 0 R 71
  1370.  
  1371. 2005 Nov 18 15:26:17, load: 0.25, disk_r: 7528 KB, disk_w: 0 KB
  1372.  
  1373. UID PID PPID CMD DEVICE MAJ MIN D %I/O
  1374. 0 2214 1663 find cmdk0 102 0 R 0
  1375. 0 2215 1663 bart cmdk0 102 0 R 85
  1376.  
  1377. 2005 Nov 18 15:26:18, load: 0.26, disk_r: 11389 KB, disk_w: 0 KB
  1378.  
  1379. UID PID PPID CMD DEVICE MAJ MIN D %I/O
  1380. 0 2214 1663 find cmdk0 102 0 R 2
  1381. 0 2215 1663 bart cmdk0 102 0 R 80
  1382.  
  1383. 2005 Nov 18 15:26:19, load: 0.26, disk_r: 22109 KB, disk_w: 0 KB
  1384.  
  1385. UID PID PPID CMD DEVICE MAJ MIN D %I/O
  1386. 0 2215 1663 bart cmdk0 102 0 R 76
  1387.  
  1388. ^C
  1389.  
  1390. In the above output, bart and find jostle for disk access as they create
  1391. a database of file checksums. The command was,
  1392.  
  1393. find / | bart create -I > /dev/null
  1394.  
  1395. Note that the %I/O is in terms of 1 disk. A %I/O of say 200 is allowed - it
  1396. would mean that effectively 2 disks were at 100%, or 4 disks at 50%, etc.
  1397.  
  1398. This is an example of the kill.d DTrace script,
  1399.  
  1400. # kill.d
  1401. FROM COMMAND SIG TO RESULT
  1402. 2344 bash 2 3117 0
  1403. 2344 bash 9 12345 -1
  1404. ^C
  1405.  
  1406. In the above output, a kill -2 (Ctrl-C) was sent from the bash command
  1407. to PID 3177. Then a kill -9 (SIGKILL) was sent to PID 12345 - which
  1408. returned a "-1" for failure.
  1409.  
  1410. The following is a demonstration of the lastwords command,
  1411.  
  1412.  
  1413. Here we run lastwords to catch syscalls from processes named "bash" as they
  1414. exit,
  1415.  
  1416. # ./lastwords bash
  1417. Tracing... Waiting for bash to exit...
  1418. 1091567219163679 1861 bash sigaction 0 0
  1419. 1091567219177487 1861 bash sigaction 0 0
  1420. 1091567219189692 1861 bash sigaction 0 0
  1421. 1091567219202085 1861 bash sigaction 0 0
  1422. 1091567219214553 1861 bash sigaction 0 0
  1423. 1091567219226690 1861 bash sigaction 0 0
  1424. 1091567219238786 1861 bash sigaction 0 0
  1425. 1091567219251697 1861 bash sigaction 0 0
  1426. 1091567219265770 1861 bash sigaction 0 0
  1427. 1091567219294110 1861 bash gtime 42a7c194 0
  1428. 1091567219428305 1861 bash write 5 0
  1429. 1091567219451138 1861 bash setcontext 0 0
  1430. 1091567219473911 1861 bash sigaction 0 0
  1431. 1091567219516487 1861 bash stat64 0 0
  1432. 1091567219547973 1861 bash open64 4 0
  1433. 1091567219638345 1861 bash write 5 0
  1434. 1091567219658886 1861 bash close 0 0
  1435. 1091567219689094 1861 bash open64 4 0
  1436. 1091567219704301 1861 bash fstat64 0 0
  1437. 1091567219731796 1861 bash read 2fe 0
  1438. 1091567219745541 1861 bash close 0 0
  1439. 1091567219768536 1861 bash lwp_sigmask ffbffeff 0
  1440. 1091567219787494 1861 bash ioctl 0 0
  1441. 1091567219801338 1861 bash setpgrp 6a3 0
  1442. 1091567219814067 1861 bash ioctl 0 0
  1443. 1091567219825791 1861 bash lwp_sigmask ffbffeff 0
  1444. 1091567219847778 1861 bash setpgrp 0 0
  1445. TIME PID EXEC SYSCALL RETURN ERR
  1446.  
  1447. In another window, a bash shell was executed and then exited normally. The
  1448. last few system calls that the bash shell made can be seen above.
  1449.  
  1450.  
  1451.  
  1452.  
  1453. In the following example we moniter the exit of bash shells again, but this
  1454. time the bash shell sends itself a "kill -8",
  1455.  
  1456. # ./lastwords bash
  1457. Tracing... Waiting for bash to exit...
  1458. 1091650185555391 1865 bash sigaction 0 0
  1459. 1091650185567963 1865 bash sigaction 0 0
  1460. 1091650185580316 1865 bash sigaction 0 0
  1461. 1091650185592381 1865 bash sigaction 0 0
  1462. 1091650185605046 1865 bash sigaction 0 0
  1463. 1091650185618451 1865 bash sigaction 0 0
  1464. 1091650185647663 1865 bash gtime 42a7c1e7 0
  1465. 1091650185794626 1865 bash kill 0 0
  1466. 1091650185836941 1865 bash lwp_sigmask ffbffeff 0
  1467. 1091650185884145 1865 bash stat64 0 0
  1468. 1091650185916135 1865 bash open64 4 0
  1469. 1091650186005673 1865 bash write b 0
  1470. 1091650186025782 1865 bash close 0 0
  1471. 1091650186052002 1865 bash open64 4 0
  1472. 1091650186067538 1865 bash fstat64 0 0
  1473. 1091650186094289 1865 bash read 309 0
  1474. 1091650186108086 1865 bash close 0 0
  1475. 1091650186129965 1865 bash lwp_sigmask ffbffeff 0
  1476. 1091650186149092 1865 bash ioctl 0 0
  1477. 1091650186162614 1865 bash setpgrp 6a3 0
  1478. 1091650186175457 1865 bash ioctl 0 0
  1479. 1091650186187206 1865 bash lwp_sigmask ffbffeff 0
  1480. 1091650186209514 1865 bash setpgrp 0 0
  1481. 1091650186225307 1865 bash sigaction 0 0
  1482. 1091650186238832 1865 bash getpid 749 0
  1483. 1091650186260149 1865 bash kill 0 0
  1484. 1091650186277925 1865 bash setcontext 0 0
  1485. TIME PID EXEC SYSCALL RETURN ERR
  1486.  
  1487. The last few system calls are different, we can see the kill system call
  1488. before bash exits.
  1489.  
  1490.  
  1491. The following is a demonstration of the loads.d script.
  1492.  
  1493.  
  1494. Here we run both loads.d and the uptime command for comparison,
  1495.  
  1496. # uptime
  1497. 1:30am up 14 day(s), 2:27, 3 users, load average: 3.52, 3.45, 3.05
  1498.  
  1499. # ./loads.d
  1500. 2005 Jun 11 01:30:49, load average: 3.52, 3.45, 3.05
  1501.  
  1502. Both have returned the same load average, confirming that loads.d is
  1503. behaving as expected.
  1504.  
  1505.  
  1506. The point of loads.d is to demonstrate fetching the same data as uptime
  1507. does, in the DTrace language. It is not intended as a replacement
  1508. or substitute to the uptime(1) command.
  1509.  
  1510. The following is an example of the newproc.d script,
  1511.  
  1512. # ./newproc.d
  1513. dtrace: description 'proc:::exec-success ' matched 1 probe
  1514. CPU ID FUNCTION:NAME
  1515. 0 3297 exec_common:exec-success man ls
  1516. 0 3297 exec_common:exec-success sh -c cd /usr/share/man; tbl /usr/share/man/man1/ls.1 |neqn /usr/share/lib/pub/
  1517. 0 3297 exec_common:exec-success tbl /usr/share/man/man1/ls.1
  1518. 0 3297 exec_common:exec-success neqn /usr/share/lib/pub/eqnchar -
  1519. 0 3297 exec_common:exec-success nroff -u0 -Tlp -man -
  1520. 0 3297 exec_common:exec-success col -x
  1521. 0 3297 exec_common:exec-success sh -c trap '' 1 15; /usr/bin/mv -f/tmp/mpzIaOZF /usr/share/man/cat1/ls.1 2> /d
  1522. 0 3297 exec_common:exec-success /usr/bin/mv -f /tmp/mpzIaOZF /usr/share/man/cat1/ls.1
  1523. 0 3297 exec_common:exec-success sh -c more -s /tmp/mpzIaOZF
  1524. 0 3297 exec_common:exec-success more -s /tmp/mpzIaOZF
  1525.  
  1526. The above output was caught when running "man ls". This identifies all the
  1527. commands responsible for processing the man page.
  1528.  
  1529. The following are examples of opensnoop. File open events are traced
  1530. along with some process details.
  1531.  
  1532.  
  1533. This first example is of the default output. The commands "cat", "cal",
  1534. "ls" and "uname" were run. The returned file descriptor (or -1 for error) are
  1535. shown, along with the filenames.
  1536.  
  1537. # ./opensnoop
  1538. UID PID COMM FD PATH
  1539. 100 3504 cat -1 /var/ld/ld.config
  1540. 100 3504 cat 3 /usr/lib/libc.so.1
  1541. 100 3504 cat 3 /etc/passwd
  1542. 100 3505 cal -1 /var/ld/ld.config
  1543. 100 3505 cal 3 /usr/lib/libc.so.1
  1544. 100 3505 cal 3 /usr/share/lib/zoneinfo/Australia/NSW
  1545. 100 3506 ls -1 /var/ld/ld.config
  1546. 100 3506 ls 3 /usr/lib/libc.so.1
  1547. 100 3507 uname -1 /var/ld/ld.config
  1548. 100 3507 uname 3 /usr/lib/libc.so.1
  1549. [...]
  1550.  
  1551.  
  1552. Full command arguments can be fetched using -g,
  1553.  
  1554. # ./opensnoop -g
  1555. UID PID PATH FD ARGS
  1556. 100 3528 /var/ld/ld.config -1 cat /etc/passwd
  1557. 100 3528 /usr/lib/libc.so.1 3 cat /etc/passwd
  1558. 100 3528 /etc/passwd 3 cat /etc/passwd
  1559. 100 3529 /var/ld/ld.config -1 cal
  1560. 100 3529 /usr/lib/libc.so.1 3 cal
  1561. 100 3529 /usr/share/lib/zoneinfo/Australia/NSW 3 cal
  1562. 100 3530 /var/ld/ld.config -1 ls -l
  1563. 100 3530 /usr/lib/libc.so.1 3 ls -l
  1564. 100 3530 /var/run/name_service_door 3 ls -l
  1565. 100 3530 /usr/share/lib/zoneinfo/Australia/NSW 4 ls -l
  1566. 100 3531 /var/ld/ld.config -1 uname -a
  1567. 100 3531 /usr/lib/libc.so.1 3 uname -a
  1568. [...]
  1569.  
  1570.  
  1571.  
  1572. The verbose option prints human readable timestamps,
  1573.  
  1574. # ./opensnoop -v
  1575. STRTIME UID PID COMM FD PATH
  1576. 2005 Jan 22 01:22:50 0 23212 df -1 /var/ld/ld.config
  1577. 2005 Jan 22 01:22:50 0 23212 df 3 /lib/libcmd.so.1
  1578. 2005 Jan 22 01:22:50 0 23212 df 3 /lib/libc.so.1
  1579. 2005 Jan 22 01:22:50 0 23212 df 3 /platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1
  1580. 2005 Jan 22 01:22:50 0 23212 df 3 /etc/mnttab
  1581. 2005 Jan 22 01:22:50 0 23211 dtrace 4 /usr/share/lib/zoneinfo/Australia/NSW
  1582. 2005 Jan 22 01:22:51 0 23213 uname -1 /var/ld/ld.config
  1583. 2005 Jan 22 01:22:51 0 23213 uname 3 /lib/libc.so.1
  1584. 2005 Jan 22 01:22:51 0 23213 uname 3 /platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1
  1585. [...]
  1586.  
  1587.  
  1588.  
  1589. Particular files can be monitored using -f. For example,
  1590.  
  1591. # ./opensnoop -vgf /etc/passwd
  1592. STRTIME UID PID PATH FD ARGS
  1593. 2005 Jan 22 01:28:50 0 23242 /etc/passwd 3 cat /etc/passwd
  1594. 2005 Jan 22 01:28:54 0 23243 /etc/passwd 4 vi /etc/passwd
  1595. 2005 Jan 22 01:29:06 0 23244 /etc/passwd 3 passwd brendan
  1596. [...]
  1597.  
  1598.  
  1599.  
  1600. This example is of opensnoop running on a quiet system. We can see as
  1601. various daemons are opening files,
  1602.  
  1603. # ./opensnoop
  1604. UID PID COMM FD PATH
  1605. 0 253 nscd 5 /etc/user_attr
  1606. 0 253 nscd 5 /etc/hosts
  1607. 0 419 mibiisa 2 /dev/kstat
  1608. 0 419 mibiisa 2 /dev/rtls
  1609. 0 419 mibiisa 2 /dev/kstat
  1610. 0 419 mibiisa 2 /dev/kstat
  1611. 0 419 mibiisa 2 /dev/rtls
  1612. 0 419 mibiisa 2 /dev/kstat
  1613. 0 253 nscd 5 /etc/user_attr
  1614. 0 419 mibiisa 2 /dev/kstat
  1615. 0 419 mibiisa 2 /dev/rtls
  1616. 0 419 mibiisa 2 /dev/kstat
  1617. 0 174 in.routed 8 /dev/kstat
  1618. 0 174 in.routed 8 /dev/kstat
  1619. 0 174 in.routed 6 /dev/ip
  1620. 0 419 mibiisa 2 /dev/kstat
  1621. 0 419 mibiisa 2 /dev/rtls
  1622. 0 419 mibiisa 2 /dev/kstat
  1623. 0 293 utmpd 4 /var/adm/utmpx
  1624. 0 293 utmpd 5 /var/adm/utmpx
  1625. 0 293 utmpd 6 /proc/442/psinfo
  1626. 0 293 utmpd 6 /proc/567/psinfo
  1627. 0 293 utmpd 6 /proc/567/psinfo
  1628. 0 293 utmpd 6 /proc/567/psinfo
  1629. 0 293 utmpd 6 /proc/567/psinfo
  1630. 0 293 utmpd 6 /proc/567/psinfo
  1631. 0 293 utmpd 6 /proc/567/psinfo
  1632. 0 293 utmpd 6 /proc/567/psinfo
  1633. 0 293 utmpd 6 /proc/567/psinfo
  1634. 0 293 utmpd 6 /proc/3013/psinfo
  1635. 0 419 mibiisa 2 /dev/kstat
  1636. 0 419 mibiisa 2 /dev/rtls
  1637. 0 419 mibiisa 2 /dev/kstat
  1638. [...]
  1639. The following is a demonstration of the pathopens.d script,
  1640.  
  1641.  
  1642. Here we run it for a few seconds then hit Ctrl-C,
  1643.  
  1644. # pathopens.d
  1645. Tracing... Hit Ctrl-C to end.
  1646. ^C
  1647. COUNT PATHNAME
  1648. 1 /lib/libcmd.so.1
  1649. 1 /export/home/root/DTrace/Dexplorer/dexplorer
  1650. 1 /lib/libmd5.so.1
  1651. 1 /lib/libaio.so.1
  1652. 1 /lib/librt.so.1
  1653. 1 /etc/security/prof_attr
  1654. 1 /etc/mnttab
  1655. 2 /devices/pseudo/devinfo@0:devinfo
  1656. 2 /dev/kstat
  1657. 2 /lib/libnvpair.so.1
  1658. 2 /lib/libkstat.so.1
  1659. 2 /lib/libdevinfo.so.1
  1660. 2 /lib/libnsl.so.1
  1661. 4 /lib/libc.so.1
  1662. 4 /var/ld/ld.config
  1663. 8 /export/home/brendan/Utils_solx86/setiathome-3.08.i386-pc-solaris2.6/outfile.sah
  1664.  
  1665. In the above output, many of the files would have been opened using
  1666. absolute pathnames. However the "dexplorer" file was opened using a relative
  1667. pathname - and the pathopens.d script has correctly printed the full path.
  1668.  
  1669. The above shows that the outfile.sah file was opened successfully 8 times.
  1670.  
  1671. The following is a demonstration of the pidpersec.d script.
  1672.  
  1673.  
  1674. Here the program is run on an idle system,
  1675.  
  1676. # ./pidpersec.d
  1677. TIME LASTPID PID/s
  1678. 2005 Jun 9 22:15:09 3010 0
  1679. 2005 Jun 9 22:15:10 3010 0
  1680. 2005 Jun 9 22:15:11 3010 0
  1681. 2005 Jun 9 22:15:12 3010 0
  1682. 2005 Jun 9 22:15:13 3010 0
  1683. ^C
  1684.  
  1685. This shows that there are now new processes being created.
  1686.  
  1687.  
  1688.  
  1689. Now the script is run on a busy system, that is creating many processes
  1690. (which happen to be short-lived),
  1691.  
  1692. # ./pidpersec.d
  1693. TIME LASTPID PID/s
  1694. 2005 Jun 9 22:16:30 3051 13
  1695. 2005 Jun 9 22:16:31 3063 12
  1696. 2005 Jun 9 22:16:32 3073 10
  1697. 2005 Jun 9 22:16:33 3084 11
  1698. 2005 Jun 9 22:16:34 3096 12
  1699. ^C
  1700.  
  1701. Now we can see that there are over 10 new processes created each second.
  1702. The value for lastpid confirms the rates printed.
  1703.  
  1704. The following is a demonstration of the priclass.d script.
  1705.  
  1706.  
  1707. The script was run for several seconds then Ctrl-C was hit. During
  1708. this time, other processes in different scheduling classes were
  1709. running.
  1710.  
  1711. # ./priclass.d
  1712. Sampling... Hit Ctrl-C to end.
  1713. ^C
  1714.  
  1715. IA
  1716. value ------------- Distribution ------------- count
  1717. 40 | 0
  1718. 50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 30
  1719. 60 | 0
  1720.  
  1721. SYS
  1722. value ------------- Distribution ------------- count
  1723. < 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4959
  1724. 0 | 0
  1725. 10 | 0
  1726. 20 | 0
  1727. 30 | 0
  1728. 40 | 0
  1729. 50 | 0
  1730. 60 | 30
  1731. 70 | 0
  1732. 80 | 0
  1733. 90 | 0
  1734. 100 | 0
  1735. 110 | 0
  1736. 120 | 0
  1737. 130 | 0
  1738. 140 | 0
  1739. 150 | 0
  1740. 160 | 50
  1741. >= 170 | 0
  1742.  
  1743. RT
  1744. value ------------- Distribution ------------- count
  1745. 90 | 0
  1746. 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110
  1747. 110 | 0
  1748.  
  1749. TS
  1750. value ------------- Distribution ------------- count
  1751. < 0 | 0
  1752. 0 |@@@@@@@@@@@@@@@ 2880
  1753. 10 |@@@@@@@ 1280
  1754. 20 |@@@@@ 990
  1755. 30 |@@@@@ 920
  1756. 40 |@@@@ 670
  1757. 50 |@@@@ 730
  1758. 60 | 0
  1759.  
  1760. The output is quite interesting, and illustrates neatly the behaviour
  1761. of different scheduling classes.
  1762.  
  1763. The IA interactive class had 30 samples of a 50 to 59 priority, a fairly
  1764. high priority. This class is used for interactive processes, such as
  1765. the windowing system. I had clicked on a few windows to create this
  1766. activity.
  1767.  
  1768. The SYS system class has had 4959 samples at a < 0 priority - the lowest,
  1769. which was for the idle thread. There are a few samples at higher
  1770. priorities, including some in the 160 to 169 range (the highest), which
  1771. are for interrupt threads. The system class is used by the kernel.
  1772.  
  1773. The RT real time class had 110 samples in the 100 to 109 priority range.
  1774. This class is designed for real-time applications, those that must have
  1775. a consistant response time regardless of other process activity. For that
  1776. reason, the RT class trumps both TS and IA. I created these events by
  1777. running "prstat -R" as root, which runs prstat in the real time class.
  1778.  
  1779. The TS time sharing class is the default scheduling class for the processes
  1780. on a Solaris system. I ran an infinite shell loop to create heavy activity,
  1781. "while :; do :; done", which shows a profile that leans towards lower
  1782. priorities. This is deliberate behaivour from the time sharing class, which
  1783. reduces the priority of CPU bound processes so that they interefere less
  1784. with I/O bound processes. The result is more samples in the lower priority
  1785. ranges.
  1786. The following are demonstrations of the pridist.d script.
  1787.  
  1788.  
  1789. Here we run pridist.d for a few seconds then hit Ctrl-C,
  1790.  
  1791. # pridist.d
  1792. Sampling... Hit Ctrl-C to end.
  1793. ^C
  1794. CMD: setiathome PID: 2190
  1795.  
  1796. value ------------- Distribution ------------- count
  1797. -5 | 0
  1798. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6629
  1799. 5 | 0
  1800.  
  1801. CMD: sshd PID: 9172
  1802.  
  1803. value ------------- Distribution ------------- count
  1804. 50 | 0
  1805. 55 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
  1806. 60 | 0
  1807.  
  1808. CMD: mozilla-bin PID: 3164
  1809.  
  1810. value ------------- Distribution ------------- count
  1811. 40 | 0
  1812. 45 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 20
  1813. 50 | 0
  1814.  
  1815. CMD: perl PID: 11544
  1816.  
  1817. value ------------- Distribution ------------- count
  1818. 10 | 0
  1819. 15 |@@@@@@@@ 60
  1820. 20 | 0
  1821. 25 |@@@@@@@@@@@@@@@ 120
  1822. 30 | 0
  1823. 35 |@@@@@@@@@@ 80
  1824. 40 | 0
  1825. 45 |@@@@@ 40
  1826. 50 | 0
  1827. 55 |@@@ 20
  1828. 60 | 0
  1829.  
  1830. During this sample there was a CPU bound process called "setiathome"
  1831. running, and a new CPU bound "perl" process was executed.
  1832.  
  1833. perl, executing an infinite loop, begins with a high priority of 55 to 59
  1834. where it is sampled 20 times. pridist.d samples 1000 times per second,
  1835. so this equates to 20 ms. The perl process has also been sampled for 40 ms
  1836. at priority 45 to 49, for 80 ms at priority 35 to 39, down to 60 ms at a
  1837. priority 15 to 19 - at which point I had hit Ctrl-C to end sampling.
  1838.  
  1839. The output is spectacular as it matches the behaviour of the dispatcher
  1840. table for the time sharing class perfectly!
  1841.  
  1842. setiathome is running with the lowest priority, in the 0 to 4 range.
  1843.  
  1844. ... ok, so when I say 20 samples equates 20 ms, we know that's only an
  1845. estimate. It really means that for 20 samples that process was the one on
  1846. the CPU. In between the samples anything may have occured (I/O bound
  1847. processes will context switch off the CPU). DTrace can certainly be used
  1848. to measure this based on schedular events not samples (eg, cpudist),
  1849. however DTrace can then sometimes consume a noticable portion of the CPUs
  1850. (for example, 2%).
  1851.  
  1852.  
  1853.  
  1854.  
  1855. The following is a longer sample. Again, I start a new CPU bound perl
  1856. process,
  1857.  
  1858. # pridist.d
  1859. Sampling... Hit Ctrl-C to end.
  1860. ^C
  1861. CMD: setiathome PID: 2190
  1862.  
  1863. value ------------- Distribution ------------- count
  1864. -5 | 0
  1865. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1820
  1866. 5 | 0
  1867.  
  1868. CMD: mozilla-bin PID: 3164
  1869.  
  1870. value ------------- Distribution ------------- count
  1871. 40 | 0
  1872. 45 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
  1873. 50 | 0
  1874.  
  1875. CMD: bash PID: 9185
  1876.  
  1877. value ------------- Distribution ------------- count
  1878. 50 | 0
  1879. 55 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
  1880. 60 | 0
  1881.  
  1882. CMD: perl PID: 11547
  1883.  
  1884. value ------------- Distribution ------------- count
  1885. -5 | 0
  1886. 0 |@@@@@@@@@@@@@@@ 2020
  1887. 5 |@@ 200
  1888. 10 |@@@@@@@ 960
  1889. 15 |@ 160
  1890. 20 |@@@@@ 720
  1891. 25 |@ 120
  1892. 30 |@@@@ 480
  1893. 35 |@ 80
  1894. 40 |@@ 240
  1895. 45 | 40
  1896. 50 |@@ 240
  1897. 55 | 10
  1898. 60 | 0
  1899.  
  1900. Now other behaviour can be observed as the perl process runs. The effect
  1901. here is due to ts_maxwait triggering a priority boot to avoid CPU starvation;
  1902. the priority is boosted to the 50 to 54 range, then decreases by 10 until
  1903. it reaches 0 and another ts_maxwait is triggered. The process spends
  1904. more time at lower priorities, as that is exactly how the TS dispatch table
  1905. has been configured.
  1906.  
  1907.  
  1908.  
  1909.  
  1910. Now we run prdist.d for a considerable time,
  1911.  
  1912. # pridist.d
  1913. Sampling... Hit Ctrl-C to end.
  1914. ^C
  1915. CMD: setiathome PID: 2190
  1916.  
  1917. value ------------- Distribution ------------- count
  1918. -5 | 0
  1919. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3060
  1920. 5 | 0
  1921.  
  1922. CMD: mozilla-bin PID: 3164
  1923.  
  1924. value ------------- Distribution ------------- count
  1925. 40 | 0
  1926. 45 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 20
  1927. 50 | 0
  1928.  
  1929. CMD: perl PID: 11549
  1930.  
  1931. value ------------- Distribution ------------- count
  1932. -5 | 0
  1933. 0 |@@@@@@@@@@@@@@@@@@@ 7680
  1934. 5 | 0
  1935. 10 |@@@@@@@ 3040
  1936. 15 | 70
  1937. 20 |@@@@@@ 2280
  1938. 25 | 120
  1939. 30 |@@@@ 1580
  1940. 35 | 80
  1941. 40 |@@ 800
  1942. 45 | 40
  1943. 50 |@@ 800
  1944. 55 | 20
  1945. 60 | 0
  1946.  
  1947. The process has settled to a pattern of 0 priority, ts_maxwait boot to 50,
  1948. drop back to 0.
  1949.  
  1950. Run "dispadmin -c TS -g" for a printout of the time sharing dispatcher table.
  1951.  
  1952.  
  1953.  
  1954.  
  1955.  
  1956. The following shows running pridist.d on a completely idle system,
  1957.  
  1958. # pridist.d
  1959. Sampling... Hit Ctrl-C to end.
  1960. ^C
  1961. CMD: sched PID: 0
  1962.  
  1963. value ------------- Distribution ------------- count
  1964. -10 | 0
  1965. -5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1190
  1966. 0 | 0
  1967.  
  1968. Only the kernel "sched" was sampled. It would have been running the idle
  1969. thread.
  1970.  
  1971.  
  1972.  
  1973.  
  1974. The following is an unusual output that is worth mentioning,
  1975.  
  1976. # pridist.d
  1977. Sampling... Hit Ctrl-C to end.
  1978. ^C
  1979. CMD: sched PID: 0
  1980.  
  1981. value ------------- Distribution ------------- count
  1982. -10 | 0
  1983. -5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 940
  1984. 0 | 0
  1985. 5 | 0
  1986. 10 | 0
  1987. 15 | 0
  1988. 20 | 0
  1989. 25 | 0
  1990. 30 | 0
  1991. 35 | 0
  1992. 40 | 0
  1993. 45 | 0
  1994. 50 | 0
  1995. 55 | 0
  1996. 60 | 0
  1997. 65 | 0
  1998. 70 | 0
  1999. 75 | 0
  2000. 80 | 0
  2001. 85 | 0
  2002. 90 | 0
  2003. 95 | 0
  2004. 100 | 0
  2005. 105 | 0
  2006. 110 | 0
  2007. 115 | 0
  2008. 120 | 0
  2009. 125 | 0
  2010. 130 | 0
  2011. 135 | 0
  2012. 140 | 0
  2013. 145 | 0
  2014. 150 | 0
  2015. 155 | 0
  2016. 160 | 0
  2017. 165 | 10
  2018. >= 170 | 0
  2019.  
  2020. Here we have sampled the kernel running at a priority of 165 to 169. This
  2021. is the interrupt priority range, and would be an interrupt servicing thread.
  2022. Eg, a network interrupt.
  2023.  
  2024. This is a demonstration of the procsystime tool, which can give details
  2025. on how processes make use of system calls.
  2026.  
  2027. Here we run procsystime on processes which have the name "bash",
  2028.  
  2029. # procsystime -n bash
  2030. Hit Ctrl-C to stop sampling...
  2031. ^C
  2032.  
  2033. Elapsed Times for process bash,
  2034.  
  2035. SYSCALL TIME (ns)
  2036. setpgrp 27768
  2037. gtime 28692
  2038. lwp_sigmask 148074
  2039. write 235814
  2040. sigaction 553556
  2041. ioctl 776691
  2042. read 857401243
  2043.  
  2044. By default procsystime prints elapsed times, the time from when the syscall
  2045. was issued to it's completion. In the above output, we can see the read()
  2046. syscall took the most time for this process - 8.57 seconds for all the
  2047. reads combined. This is because the read syscall is waiting for keystrokes.
  2048.  
  2049.  
  2050.  
  2051. Here we try the "-o" option to print CPU overhead times on "bash",
  2052.  
  2053. # procsystime -o -n bash
  2054. Hit Ctrl-C to stop sampling...
  2055. ^C
  2056.  
  2057. CPU Times for process bash,
  2058.  
  2059. SYSCALL TIME (ns)
  2060. setpgrp 6994
  2061. gtime 8054
  2062. lwp_sigmask 33865
  2063. read 154895
  2064. sigaction 259899
  2065. write 343825
  2066. ioctl 932280
  2067.  
  2068. This identifies which syscall type from bash is consuming the most CPU time.
  2069. This is ioctl, at 932 microseconds. Compare this output to the default in
  2070. the first example - both are useful for different reasons, this CPU overhead
  2071. output helps us see why processes are consuming a lot of sys time.
  2072.  
  2073.  
  2074.  
  2075. This demonstrates using the "-a" for all details, this time with "ssh",
  2076.  
  2077. # procsystime -a -n ssh
  2078. Hit Ctrl-C to stop sampling...
  2079. ^C
  2080.  
  2081. Elapsed Times for processes ssh,
  2082.  
  2083. SYSCALL TIME (ns)
  2084. read 115833
  2085. write 302419
  2086. pollsys 114616076
  2087. TOTAL: 115034328
  2088.  
  2089. CPU Times for processes ssh,
  2090.  
  2091. SYSCALL TIME (ns)
  2092. read 82381
  2093. pollsys 201818
  2094. write 280390
  2095. TOTAL: 564589
  2096.  
  2097. Syscall Counts for processes ssh,
  2098.  
  2099. SYSCALL COUNT
  2100. read 4
  2101. write 4
  2102. pollsys 8
  2103. TOTAL: 16
  2104.  
  2105. Now we can see elapsed times, overhead times, and syscall counts in one
  2106. report. Very handy. We can also see totals printed as "TOTAL:".
  2107.  
  2108.  
  2109.  
  2110. procsystime also lets us just examine one PID. For example,
  2111.  
  2112. # procsystime -p 1304
  2113. Hit Ctrl-C to stop sampling...
  2114. ^C
  2115.  
  2116. Elapsed Times for PID 1304,
  2117.  
  2118. SYSCALL TIME (ns)
  2119. fcntl 7323
  2120. fstat64 21349
  2121. ioctl 190683
  2122. read 238197
  2123. write 1276169
  2124. pollsys 1005360640
  2125.  
  2126.  
  2127.  
  2128. Here is a longer example of running procsystime on mozilla,
  2129.  
  2130. # procsystime -a -n mozilla-bin
  2131. Hit Ctrl-C to stop sampling...
  2132. ^C
  2133.  
  2134. Elapsed Times for processes mozilla-bin,
  2135.  
  2136. SYSCALL TIME (ns)
  2137. readv 677958
  2138. writev 1159088
  2139. yield 1298742
  2140. read 18019194
  2141. write 35679619
  2142. ioctl 108845685
  2143. lwp_park 38090969432
  2144. pollsys 65955258781
  2145. TOTAL: 104211908499
  2146.  
  2147. CPU Times for processes mozilla-bin,
  2148.  
  2149. SYSCALL TIME (ns)
  2150. yield 120345
  2151. readv 398046
  2152. writev 1117178
  2153. lwp_park 8591428
  2154. read 9752315
  2155. write 29043460
  2156. ioctl 37089349
  2157. pollsys 189933470
  2158. TOTAL: 276045591
  2159.  
  2160. Syscall Counts for processes mozilla-bin,
  2161.  
  2162. SYSCALL COUNT
  2163. writev 3
  2164. yield 9
  2165. readv 58
  2166. lwp_park 280
  2167. write 1317
  2168. read 1744
  2169. pollsys 8268
  2170. ioctl 16434
  2171. TOTAL: 28113
  2172.  
  2173.  
  2174.  
  2175. The following is a demonstration of the rwbypid.d script,
  2176.  
  2177.  
  2178. Here we run it for a few seconds then hit Ctrl-C,
  2179.  
  2180. # rwbypid.d
  2181. Tracing... Hit Ctrl-C to end.
  2182. ^C
  2183. PID CMD DIR COUNT
  2184. 11131 dtrace W 2
  2185. 20334 sshd W 17
  2186. 20334 sshd R 24
  2187. 1532 Xorg W 69
  2188. 1659 mozilla-bin R 852
  2189. 1659 mozilla-bin W 1128
  2190. 1532 Xorg R 1702
  2191.  
  2192. In the above output, we can see that Xorg with PID 1532 has made 1702 reads.
  2193.  
  2194. The following is an example fo the rwbytype.d script.
  2195.  
  2196.  
  2197. We run rwbytype.d for a few seconds then hit Ctrl-C,
  2198.  
  2199. # rwbytype.d
  2200. Tracing... Hit Ctrl-C to end.
  2201. ^C
  2202. PID CMD VTYPE DIR BYTES
  2203. 1545 sshd chr W 1
  2204. 10357 more chr R 30
  2205. 2357 sshd chr W 31
  2206. 10354 dtrace chr W 32
  2207. 1545 sshd chr R 34
  2208. 6778 bash chr W 44
  2209. 1545 sshd sock R 52
  2210. 405 poold reg W 68
  2211. 1545 sshd sock W 136
  2212. 10357 bash reg R 481
  2213. 10356 find reg R 481
  2214. 10355 bash reg R 481
  2215. 10357 more reg R 1652
  2216. 2357 sshd sock R 1664
  2217. 10357 more chr W 96925
  2218. 10357 more fifo R 97280
  2219. 2357 sshd chr R 98686
  2220. 10356 grep fifo W 117760
  2221. 2357 sshd sock W 118972
  2222. 10356 grep reg R 147645
  2223.  
  2224. Here we can see that the grep process with PID 10356 read 147645 bytes
  2225. from "regular" files. These are I/O bytes at the application level, so
  2226. much of these read bytes would have been cached by the filesystem page cache.
  2227.  
  2228. vnode file types are listed in /usr/include/sys/vnode.h, and give an idea of
  2229. what the file descriptor refers to.
  2230.  
  2231. The following is a demonstration of the rwsnoop program,
  2232.  
  2233.  
  2234. Here we run it for about a second,
  2235.  
  2236. # rwsnoop
  2237. UID PID CMD D BYTES FILE
  2238. 100 20334 sshd R 52 <unknown>
  2239. 100 20334 sshd W 1 /devices/pseudo/clone@0:ptm
  2240. 0 20320 bash W 1 /devices/pseudo/pts@0:12
  2241. 100 20334 sshd R 2 /devices/pseudo/clone@0:ptm
  2242. 100 20334 sshd W 52 <unknown>
  2243. 0 2848 ls W 58 /devices/pseudo/pts@0:12
  2244. 0 2848 ls W 68 /devices/pseudo/pts@0:12
  2245. 0 2848 ls W 57 /devices/pseudo/pts@0:12
  2246. 0 2848 ls W 67 /devices/pseudo/pts@0:12
  2247. 0 2848 ls W 48 /devices/pseudo/pts@0:12
  2248. 0 2848 ls W 49 /devices/pseudo/pts@0:12
  2249. 0 2848 ls W 33 /devices/pseudo/pts@0:12
  2250. 0 2848 ls W 41 /devices/pseudo/pts@0:12
  2251. 100 20334 sshd R 429 /devices/pseudo/clone@0:ptm
  2252. 100 20334 sshd W 468 <unknown>
  2253. ^C
  2254.  
  2255. The output scrolls rather fast. Above, we can see an ls command was run,
  2256. and we can see as ls writes each line. The "<unknown>" read/writes are
  2257. socket activity, which have no corresponding filename.
  2258.  
  2259.  
  2260. For a summary style output, use the rwtop program.
  2261.  
  2262.  
  2263.  
  2264. If a particular program is of interest, the "-n" option can be used
  2265. to match on process name. Here we match on "bash" during a login where
  2266. the user uses the bash shell as their default,
  2267.  
  2268. # rwsnoop -n bash
  2269. UID PID CMD D BYTES FILE
  2270. 100 2854 bash R 757 /etc/nsswitch.conf
  2271. 100 2854 bash R 0 /etc/nsswitch.conf
  2272. 100 2854 bash R 668 /etc/passwd
  2273. 100 2854 bash R 980 /etc/profile
  2274. 100 2854 bash W 15 /devices/pseudo/pts@0:14
  2275. 100 2854 bash R 10 /export/home/brendan/.bash_profile
  2276. 100 2854 bash R 867 /export/home/brendan/.bashrc
  2277. 100 2854 bash R 980 /etc/profile
  2278. 100 2854 bash W 15 /devices/pseudo/pts@0:14
  2279. 100 2854 bash R 8951 /export/home/brendan/.bash_history
  2280. 100 2854 bash R 8951 /export/home/brendan/.bash_history
  2281. 100 2854 bash R 1652 /usr/share/lib/terminfo/d/dtterm
  2282. 100 2854 bash W 41 /devices/pseudo/pts@0:14
  2283. 100 2854 bash R 1 /devices/pseudo/pts@0:14
  2284. 100 2854 bash W 1 /devices/pseudo/pts@0:14
  2285. 100 2854 bash W 41 /devices/pseudo/pts@0:14
  2286. 100 2854 bash R 1 /devices/pseudo/pts@0:14
  2287. 100 2854 bash W 7 /devices/pseudo/pts@0:14
  2288.  
  2289. In the above, various bash related files such as ".bash_profile" and
  2290. ".bash_history" can be seen. The ".bashrc" is also read, as it was sourced
  2291. from the .bash_profile.
  2292.  
  2293.  
  2294.  
  2295. Extra options with rwsnoop allow us to print zone ID, project ID, timestamps,
  2296. etc. Here we use "-v" to see the time printed, and match on "ps" processes,
  2297.  
  2298. # rwsnoop -vn ps
  2299. TIMESTR UID PID CMD D BYTES FILE
  2300. 2005 Jul 24 04:23:45 0 2804 ps R 168 /proc/2804/auxv
  2301. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/2804/psinfo
  2302. 2005 Jul 24 04:23:45 0 2804 ps R 1495 /etc/ttysrch
  2303. 2005 Jul 24 04:23:45 0 2804 ps W 28 /devices/pseudo/pts.
  2304. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/0/psinfo
  2305. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/1/psinfo
  2306. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/2/psinfo
  2307. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/3/psinfo
  2308. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/218/psinfo
  2309. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/7/psinfo
  2310. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/9/psinfo
  2311. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/360/psinfo
  2312. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/91/psinfo
  2313. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/112/psinfo
  2314. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/307/psinfo
  2315. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/226/psinfo
  2316. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/242/psinfo
  2317. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/228/psinfo
  2318. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/243/psinfo
  2319. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/234/psinfo
  2320. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/119/psinfo
  2321. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/143/psinfo
  2322. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/361/psinfo
  2323. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/20314/psinfo
  2324. 2005 Jul 24 04:23:45 0 2804 ps R 336 /proc/116/psinfo
  2325. [...]
  2326.  
  2327.  
  2328.  
  2329. The following is an example of the sampleproc program.
  2330.  
  2331.  
  2332. Here we run sampleproc for a few seconds on a workstation,
  2333.  
  2334. # ./sampleproc
  2335. Sampling at 100 hertz... Hit Ctrl-C to end.
  2336. ^C
  2337. PID CMD COUNT
  2338. 1659 mozilla-bin 3
  2339. 109 nscd 4
  2340. 2197 prstat 23
  2341. 2190 setiathome 421
  2342.  
  2343. PID CMD PERCENT
  2344. 1659 mozilla-bin 0
  2345. 109 nscd 0
  2346. 2197 prstat 5
  2347. 2190 setiathome 93
  2348.  
  2349. The first table shows a count of how many times each process was sampled
  2350. on the CPU. The second table gives this as a percentage.
  2351.  
  2352. setiathome was on the CPU 421 times, which is 93% of the samples.
  2353.  
  2354.  
  2355.  
  2356.  
  2357. The following is sampleproc running on a server with 4 CPUs. A bash shell
  2358. is running in an infinate loop,
  2359.  
  2360. # ./sampleproc
  2361. Sampling at 100 hertz... Hit Ctrl-C to end.
  2362. ^C
  2363. PID CMD COUNT
  2364. 10140 dtrace 1
  2365. 28286 java 1
  2366. 29345 esd 2
  2367. 29731 esd 3
  2368. 2 pageout 4
  2369. 29733 esd 6
  2370. 10098 bash 1015
  2371. 0 sched 3028
  2372.  
  2373. PID CMD PERCENT
  2374. 10140 dtrace 0
  2375. 28286 java 0
  2376. 29345 esd 0
  2377. 29731 esd 0
  2378. 2 pageout 0
  2379. 29733 esd 0
  2380. 10098 bash 24
  2381. 0 sched 74
  2382.  
  2383. The bash shell was on the CPUs for 24% of the time, which is consistant
  2384. with a CPU bound single threaded application on a 4 CPU server.
  2385.  
  2386. The above sample was around 10 seconds long. During this time, there were
  2387. around 4000 samples (checking the COUNT column), this is due to
  2388. 4000 = CPUs (4) * Hertz (100) * Seconds (10).
  2389.  
  2390.  
  2391. The following are examples of seeksize.d.
  2392.  
  2393. seeksize.d records disk head seek size for each operation by process.
  2394. This allows up to identify processes that are causing "random" disk
  2395. access and those causing "sequential" disk access.
  2396.  
  2397. It is desirable for processes to be accesing the disks in large
  2398. sequential operations. By using seeksize.d and bitesize.d we can
  2399. identify this behaviour.
  2400.  
  2401.  
  2402.  
  2403. In this example we read through a large file by copying it to a
  2404. remote server. Most of the seek sizes are zero, indicating sequential
  2405. access - and we would expect good performance from the disks
  2406. under these conditions,
  2407.  
  2408. # ./seeksize.d
  2409. Tracing... Hit Ctrl-C to end.
  2410. ^C
  2411.  
  2412. 22349 scp /dl/sol-10-b63-x86-v1.iso mars:\0
  2413.  
  2414. value ------------- Distribution ------------- count
  2415. -1 | 0
  2416. 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 726
  2417. 1 | 0
  2418. 2 | 0
  2419. 4 | 0
  2420. 8 |@ 13
  2421. 16 | 4
  2422. 32 | 0
  2423. 64 | 0
  2424. 128 | 2
  2425. 256 | 3
  2426. 512 | 4
  2427. 1024 | 4
  2428. 2048 | 3
  2429. 4096 | 0
  2430. 8192 | 3
  2431. 16384 | 0
  2432. 32768 | 1
  2433. 65536 | 0
  2434.  
  2435.  
  2436.  
  2437. In this example we run find. The disk operations are fairly scattered,
  2438. as illustrated below by the volume of non sequential reads,
  2439.  
  2440. # ./seeksize.d
  2441. Tracing... Hit Ctrl-C to end.
  2442. ^C
  2443.  
  2444. 22399 find /var/sadm/pkg/\0
  2445.  
  2446. value ------------- Distribution ------------- count
  2447. -1 | 0
  2448. 0 |@@@@@@@@@@@@@ 1475
  2449. 1 | 0
  2450. 2 | 44
  2451. 4 |@ 77
  2452. 8 |@@@ 286
  2453. 16 |@@ 191
  2454. 32 |@ 154
  2455. 64 |@@ 173
  2456. 128 |@@ 179
  2457. 256 |@@ 201
  2458. 512 |@@ 186
  2459. 1024 |@@ 236
  2460. 2048 |@@ 201
  2461. 4096 |@@ 274
  2462. 8192 |@@ 243
  2463. 16384 |@ 154
  2464. 32768 |@ 113
  2465. 65536 |@@ 182
  2466. 131072 |@ 81
  2467. 262144 | 0
  2468.  
  2469.  
  2470.  
  2471.  
  2472. I found the following interesting. This time I gzipp'd the large file.
  2473. While zipping, the process is reading from one location and writing
  2474. to another. One might expect that as the program toggles between
  2475. reading from one location and writing to another, that often the
  2476. distance would be the same (depending on where UFS puts the new file),
  2477.  
  2478. # ./seeksize.d
  2479. Tracing... Hit Ctrl-C to end.
  2480. ^C
  2481.  
  2482. 22368 gzip sol-10-b63-x86-v1.iso\0
  2483.  
  2484. value ------------- Distribution ------------- count
  2485. -1 | 0
  2486. 0 |@@@@@@@@@@@@ 353
  2487. 1 | 0
  2488. 2 | 0
  2489. 4 | 0
  2490. 8 | 7
  2491. 16 | 4
  2492. 32 | 2
  2493. 64 | 4
  2494. 128 | 14
  2495. 256 | 3
  2496. 512 | 3
  2497. 1024 | 5
  2498. 2048 | 1
  2499. 4096 | 0
  2500. 8192 | 3
  2501. 16384 | 1
  2502. 32768 | 1
  2503. 65536 | 1
  2504. 131072 | 1
  2505. 262144 |@@@@@@@@ 249
  2506. 524288 | 1
  2507. 1048576 | 2
  2508. 2097152 | 1
  2509. 4194304 | 2
  2510. 8388608 |@@@@@@@@@@@@@@@@@@ 536
  2511. 16777216 | 0
  2512.  
  2513.  
  2514.  
  2515.  
  2516. The following example compares the operation of "find" with "tar".
  2517. Both are reading from the same location, and we would expect that
  2518. both programs would generally need to do the same number of seeks
  2519. to navigate the direttory tree (depending on caching); and tar
  2520. causing extra operations as it reads the file contents as well,
  2521.  
  2522. # ./seeksize.d
  2523. Tracing... Hit Ctrl-C to end.
  2524. ^C
  2525.  
  2526. PID CMD
  2527. 22278 find /etc\0
  2528.  
  2529. value ------------- Distribution ------------- count
  2530. -1 | 0
  2531. 0 |@@@@@@@@@@@@@@@@@@@@ 251
  2532. 1 | 0
  2533. 2 |@ 8
  2534. 4 | 5
  2535. 8 |@ 10
  2536. 16 |@ 10
  2537. 32 |@ 10
  2538. 64 |@ 9
  2539. 128 |@ 11
  2540. 256 |@ 14
  2541. 512 |@@ 20
  2542. 1024 |@ 10
  2543. 2048 | 6
  2544. 4096 |@ 7
  2545. 8192 |@ 10
  2546. 16384 |@ 16
  2547. 32768 |@@ 21
  2548. 65536 |@@ 28
  2549. 131072 |@ 7
  2550. 262144 |@ 14
  2551. 524288 | 6
  2552. 1048576 |@ 15
  2553. 2097152 |@ 7
  2554. 4194304 | 0
  2555.  
  2556.  
  2557. 22282 tar cf /dev/null /etc\0
  2558.  
  2559. value ------------- Distribution ------------- count
  2560. -1 | 0
  2561. 0 |@@@@@@@@@@ 397
  2562. 1 | 0
  2563. 2 | 8
  2564. 4 | 14
  2565. 8 | 16
  2566. 16 |@ 24
  2567. 32 |@ 29
  2568. 64 |@@ 99
  2569. 128 |@@ 73
  2570. 256 |@@ 78
  2571. 512 |@@@ 109
  2572. 1024 |@@ 62
  2573. 2048 |@@ 69
  2574. 4096 |@@ 73
  2575. 8192 |@@@ 113
  2576. 16384 |@@ 81
  2577. 32768 |@@@ 111
  2578. 65536 |@@@ 108
  2579. 131072 |@ 49
  2580. 262144 |@ 33
  2581. 524288 | 20
  2582. 1048576 | 13
  2583. 2097152 | 7
  2584. 4194304 | 5
  2585. 8388608 |@ 30
  2586. 16777216 | 0
  2587.  
  2588. The following is an example of setuids.d. Login events in particular can
  2589. be seen, along with use of the "su" command.
  2590.  
  2591. # ./setuids.d
  2592. UID SUID PPID PID PCMD CMD
  2593. 0 100 3037 3040 in.telnetd login -p -h mars -d /dev/pts/12
  2594. 100 0 3040 3045 bash su -
  2595. 0 102 3045 3051 sh su - fred
  2596. 0 100 3055 3059 sshd /usr/lib/ssh/sshd
  2597. 0 100 3065 3067 in.rlogind login -d /dev/pts/12 -r mars
  2598. 0 100 3071 3073 in.rlogind login -d /dev/pts/12 -r mars
  2599. 0 102 3078 3081 in.telnetd login -p -h mars -d /dev/pts/12
  2600. ^C
  2601.  
  2602. The first line is a telnet login to the user brendan, UID 100. The parent
  2603. command is "in.telnetd", the telnet daemon spawned by inetd, and the
  2604. command that in.telnetd runs is "login".
  2605.  
  2606. The second line shows UID 100 using the "su" command to become root.
  2607.  
  2608. The third line has the root user using "su" to become fred, UID 102.
  2609.  
  2610. The fourth line is an example of an ssh login.
  2611.  
  2612. The fifth and sixth lines are examples of rsh and rlogin.
  2613.  
  2614. The last line is another example of a telnet login for fred, UID 102.
  2615.  
  2616. The following is a demonstration of the sigdist.d script.
  2617.  
  2618.  
  2619. Here we run sigdist.d, and in another window we kill -9 a sleep process,
  2620.  
  2621. # ./sigdist.d
  2622. Tracing... Hit Ctrl-C to end.
  2623. ^C
  2624. SENDER RECIPIENT SIG COUNT
  2625. sched dtrace 2 1
  2626. sched bash 18 1
  2627. bash sleep 9 1
  2628. sched Xorg 14 55
  2629.  
  2630. We can see the signal sent from bash to sleep. We can also see that Xorg
  2631. has recieved 55 signal 14s. a "man -s3head signal" may help explain what
  2632. signal 14 is (alarm clock).
  2633.  
  2634. The following is a demonstration of the syscallbypid.d script,
  2635.  
  2636.  
  2637. Here we run syscallbypid.d for a few seconds then hit Ctrl-C,
  2638.  
  2639. # syscallbypid.d
  2640. Tracing... Hit Ctrl-C to end.
  2641. ^C
  2642. PID CMD SYSCALL COUNT
  2643. 11039 dtrace setcontext 1
  2644. 11039 dtrace lwp_sigmask 1
  2645. 7 svc.startd portfs 1
  2646. 357 poold lwp_cond_wait 1
  2647. 27328 java_vm lwp_cond_wait 1
  2648. 1532 Xorg writev 1
  2649. 11039 dtrace lwp_park 1
  2650. 11039 dtrace schedctl 1
  2651. 11039 dtrace mmap 1
  2652. 361 sendmail pollsys 1
  2653. 11039 dtrace fstat64 1
  2654. 11039 dtrace sigaction 2
  2655. 11039 dtrace write 2
  2656. 361 sendmail lwp_sigmask 2
  2657. 1659 mozilla-bin yield 2
  2658. 11039 dtrace sysconfig 3
  2659. 361 sendmail pset 3
  2660. 20317 sshd read 4
  2661. 361 sendmail gtime 4
  2662. 20317 sshd write 4
  2663. 27328 java_vm ioctl 6
  2664. 11039 dtrace brk 8
  2665. 1532 Xorg setcontext 8
  2666. 1532 Xorg lwp_sigmask 8
  2667. 20317 sshd pollsys 8
  2668. 357 poold pollsys 13
  2669. 1659 mozilla-bin read 16
  2670. 20317 sshd lwp_sigmask 16
  2671. 1532 Xorg setitimer 17
  2672. 27328 java_vm pollsys 18
  2673. 1532 Xorg pollsys 19
  2674. 11039 dtrace p_online 21
  2675. 1532 Xorg read 22
  2676. 1659 mozilla-bin write 25
  2677. 1659 mozilla-bin lwp_park 26
  2678. 11039 dtrace ioctl 36
  2679. 1659 mozilla-bin pollsys 155
  2680. 1659 mozilla-bin ioctl 306
  2681.  
  2682. In the above output, we can see that "mozilla-bin" with PID 1659 made the
  2683. most system calls - 306 ioctl()s.
  2684. The following is an example of the syscallbyproc.d script,
  2685.  
  2686. # syscallbyproc.d
  2687. dtrace: description 'syscall:::entry ' matched 228 probes
  2688. ^C
  2689. snmpd 1
  2690. utmpd 2
  2691. inetd 2
  2692. nscd 7
  2693. svc.startd 11
  2694. sendmail 31
  2695. poold 133
  2696. dtrace 1720
  2697.  
  2698. The above output shows that dtrace made the most system calls in this sample,
  2699. 1720 syscalls.
  2700.  
  2701. The following is a demonstration of the syscallbysysc.d script,
  2702.  
  2703. # syscallbysysc.d
  2704. dtrace: description 'syscall:::entry ' matched 228 probes
  2705. ^C
  2706. fstat 1
  2707. setcontext 1
  2708. lwp_park 1
  2709. schedctl 1
  2710. mmap 1
  2711. sigaction 2
  2712. pset 2
  2713. lwp_sigmask 2
  2714. gtime 3
  2715. sysconfig 3
  2716. write 4
  2717. brk 6
  2718. pollsys 7
  2719. p_online 558
  2720. ioctl 579
  2721.  
  2722. In the above output, the ioctl system call was the most common, occuring
  2723. 579 times.
  2724.  
  2725. The following is a demonstration of the topsyscall command,
  2726.  
  2727.  
  2728. Here topsyscall is run with no arguments,
  2729.  
  2730. # topsyscall
  2731. 2005 Jun 13 22:13:21, load average: 1.24, 1.24, 1.22 syscalls: 1287
  2732.  
  2733. SYSCALL COUNT
  2734. getgid 4
  2735. getuid 5
  2736. waitsys 5
  2737. xstat 7
  2738. munmap 7
  2739. sysconfig 8
  2740. brk 8
  2741. setcontext 8
  2742. open 8
  2743. getpid 9
  2744. close 9
  2745. resolvepath 10
  2746. lwp_sigmask 22
  2747. mmap 26
  2748. lwp_park 43
  2749. read 59
  2750. write 72
  2751. sigaction 113
  2752. pollsys 294
  2753. ioctl 520
  2754.  
  2755. The screen updates every second, and continues until Ctrl-C is hit to
  2756. end the program.
  2757.  
  2758. In the above output we can see that the ioctl() system call occured 520 times,
  2759. pollsys() 294 times and sigaction() 113 times.
  2760.  
  2761.  
  2762.  
  2763. Here the command is run with a 10 second interval,
  2764.  
  2765. # topsyscall 10
  2766. 2005 Jun 13 22:15:35, load average: 1.21, 1.22, 1.22 syscalls: 10189
  2767.  
  2768. SYSCALL COUNT
  2769. writev 6
  2770. close 7
  2771. lseek 7
  2772. open 7
  2773. brk 8
  2774. nanosleep 9
  2775. portfs 10
  2776. llseek 14
  2777. lwp_cond_wait 21
  2778. p_online 21
  2779. gtime 27
  2780. rusagesys 71
  2781. setcontext 92
  2782. lwp_sigmask 98
  2783. setitimer 183
  2784. lwp_park 375
  2785. write 438
  2786. read 551
  2787. pollsys 3071
  2788. ioctl 5144
  2789.  
  2790. The following is a demonstration of the topsysproc program,
  2791.  
  2792.  
  2793. Here we run topsysproc with no arguments,
  2794.  
  2795. # topsysproc
  2796. 2005 Jun 13 22:25:16, load average: 1.24, 1.23, 1.21 syscalls: 1347
  2797.  
  2798. PROCESS COUNT
  2799. svc.startd 1
  2800. nscd 1
  2801. setiathome 7
  2802. poold 18
  2803. sshd 21
  2804. java_vm 35
  2805. tput 49
  2806. dtrace 56
  2807. Xorg 108
  2808. sh 110
  2809. clear 122
  2810. mozilla-bin 819
  2811.  
  2812. The screen refreshes every 1 second, which can be changed by specifying
  2813. a different interval at the command line.
  2814.  
  2815. In the above output we can see that processes with the name "mozilla-bin"
  2816. made 819 system calls, while processes with the name "clear" made 122.
  2817.  
  2818.  
  2819.  
  2820. Now topsysproc is run with a 15 second interval,
  2821.  
  2822. # topsysproc 15
  2823. 2005 Jun 13 22:29:43, load average: 1.19, 1.20, 1.20 syscalls: 15909
  2824.  
  2825. PROCESS COUNT
  2826. fmd 1
  2827. inetd 2
  2828. svc.configd 2
  2829. gconfd-2 3
  2830. miniserv.pl 3
  2831. sac 6
  2832. snmpd 6
  2833. sshd 8
  2834. automountd 8
  2835. ttymon 9
  2836. svc.startd 17
  2837. nscd 21
  2838. in.routed 37
  2839. sendmail 41
  2840. setiathome 205
  2841. poold 293
  2842. dtrace 413
  2843. java_vm 529
  2844. Xorg 1234
  2845. mozilla-bin 13071
  2846. bash-3.2$
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement