a guest Apr 26th, 2019 69 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. artificial ignorance: how-to guide
  3. Marcus J. Ranum
  4. Tue, 23 Sep 1997 23:06:06 +0000
  5. Previous message: Here is my plan for firewall implementation
  6. Next message: artificial ignorance: how-to guide
  7. Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
  9. By request, here's a quick how-to on log scanning via
  10. artificial ignorance. :) It assumes UNIX and the presence
  11. of a good grep - you could use other stuff if you wanted to
  12. but this is just an example.
  14. Setting up a filter is a process of constant tuning. First
  15. you build a file of common strings that aren't interesting,
  16. and, as new uninteresting things happen, you add them
  17. to the file.
  19. I start with a shell command like this:
  21. cd /var/log
  22. cat * | \
  23.     sed -e 's/^.*demo//' -e 's/\[[0-9]*\]//' | \
  24.     sort | uniq -c | \
  25.     sort -r -n > /tmp/xx
  27. In this example "demo" is my laptop's name, and I use
  28. it in the sed command to strip out the leading lines of
  29. syslog messages so that I lose the date/timestamps. This
  30. means that the overall variation in the text is reduced
  31. considerably. The next argument to sed strips out the
  32. PID from the daemon, another source of text variation.
  33. we then sort it, collapse duplicates into a count, then
  34. sort the count numerically.
  36. This yields a file of the frequency with which something
  37. shows up in syslog (more or less):
  38.  297  cron: (root) CMD (/usr/bin/at)
  39.  167  sendmail: alias database /etc/aliases.db out of date
  40.  120  ftpd: PORT
  41.   61  lpd: restarted
  42.   48  kernel: wdpi0: transfer size=2048 intr cmd DRQ
  43.  ... etc
  45. In the example on "demo" this reduced 3982 lines of
  46. syslog records to 889.
  48. Then what you want to do is trim from BOTH ends of
  49. the file and build an "ignore this" list. In this example, I
  50. don't care that cron ran "at" OK so I'd add a regexp
  51. like:
  52. cron.*: (root) CMD (/usr/bin/at)
  53. That's a pretty precise one. :)
  55. At the bottom of my file there were about 200 entries
  56. that looked like:
  57.    1  ftpd: RETR pic9.jpg
  58.    1  ftpd: RETR pic8.jpg
  59.    1  ftpd: RETR pic7.jpg
  60.    1  ftpd: RETR pic6.jpg
  62. Clearly these are highly unique events but also not
  63. interesting. So I add patterns that look like:
  64. ftpd.*: RETR
  65. ftpd.*: STOR
  66. ftpd.*: CWD
  67. ftpd.*: USER
  68. ftpd.*: FTP LOGIN FROM
  70. Now, you apply your stop-list as follows:
  71. cat * | grep -v -f stoplist | \
  72.     sort, etc --
  74. This time I get 744 lines. Putting a pattern in that
  75. matches:
  76. sendmail.*: .*to=
  78. Drops it down to 120 lines. Just keep doing this and
  79. pretty soon you'll have a set of patterns that make your
  80. whole syslog output disappear. You'll notice that in the
  81. early example I had a warning from sendmail because
  82. the aliases database was out of date. Rather than putting
  83. a pattern for that, I simply ran newalias. Next time my
  84. aliases database is out of date, my log scanner will tell
  85. me.
  87. System reboots are cool, too. My log shows:
  88.   48  kernel: wdc2 at pcmcia0: PCCARD IDE disk controller
  89.   48  kernel: wdc1 at pcmcia0: PCCARD IDE disk controller
  90.   48  kernel: wdc0 at isa0 iobase 0x1f0 irq 14: disk controller
  91.   48  kernel: wd0 at wdc0 drive 0: sec/int=4 2818368*512
  92.   ...
  94. Those will be pretty much static. So I add those exact
  95. lines. Now they won't show up whenever the system
  96. boots. BUT I'll get a notification if a new SCSI drive
  97. is added, or (I did this deliberately!):
  99. kernel: fd0c: hard error writing fsbn 1 of 1-19 (fd0 bn 1; cn
  100. kernel: fd0: write protected
  102. Oooh! Some bad boy trying to step on my tripwire file!
  104. Or:
  105. kernel: changing root device to wd1a
  107. ..interesting. My pattern was for wd0a!
  109. I used to run this kind of stuff on a firewall that I used
  110. to manage. One day its hard disk burned up and my
  111. log scan cheerfully found these new messages about
  112. bad block replacement and sent them to me. :) The
  113. advantage of this approach is that it's dumb, it's
  114. cheap -- and it catches stuff you don't know about
  115. already.
  117. Once you've got your pattern file tuned, put it in
  118. cron or whatever, so it runs often. The TIS Gauntlet
  119. has a hack I wrote called "retail" which I can't
  120. unfortunately release the code for, but is easy to
  121. implement. Basically, it was like tail but it remembered
  122. the offset in the file from the previous run, and the
  123. inode of the file (so it'd detect file shifts) - the trick is
  124. to keep one fd open to the file and seek within it,
  125. then stat it every so often to see if the file has grown
  126. or changed inode. If it has, read to EOF, open the new
  127. file, and start again. That way you can chop the end
  128. of the log file through a filter every couple seconds
  129. with minimal expense in CPU and disk I/O.
  131. I'm sure there are lots of fun ways this simple trick
  132. can be enhanced -- but just in its naive form I've found
  133. it quite useful. I wish I had a program that helped me
  134. statistically build my noise filters, but in general I find
  135. it's about a 2 hour job, tops, and it's one you do once
  136. and forget about.
  138. Enjoy!
  139. mjr.
  140. -----
  141. Marcus J. Ranum, CEO, Network Flight Recorder, Inc.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand