xdxdxd123

Untitled

May 31st, 2017
1,123
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 142.23 KB | None | 0 0
  1. It has the format 's/substitution_pattern/replacement_string/g .
  2. It replaces every occurrence of substitution_pattern with the replacement string.
  3. Here the substitution pattern is the regex for a sentence. Every sentence is delimited by "."
  4. and the first character is a space. Therefore, we need to match the text that is in the format
  5. "space" some text MATCH_STRING some text "dot". A sentence may contain any characters
  6. except a "dot", which is the delimiter. Hence we have used [^.]. [^.]* matches a combination of
  7. any characters except dot. In between the text match string "mobile phones" is placed. Every
  8. match sentence is replaced by // (nothing).
  9. See also
  10. f Basic sed primer, explains the sed command
  11. f Basic regular expression primer, explains how to use regular expressions
  12. Implementing head, tail, and tac with awk
  13. Mastering text-processing operations comes with practice. This recipe will help us practice
  14. incorporating some of the commands that we have just learned with some that we already
  15. know.
  16. Getting ready
  17. The commands head , tail , uniq , and tac operate line by line. Whenever we need line by
  18. line processing, we can always use awk . Let's emulate these commands with awk .
  19. www.it-ebooks.info
  20. Texting and Driving
  21. 176
  22. How to do it...
  23. Let's see how different commands can be emulated with different basic text processing
  24. commands, such as head, tail, and tac.
  25. The head command reads the first ten lines of a file and prints them out:
  26. $ awk 'NR <=10' filename
  27. The tail command prints the last ten lines of a file:
  28. $ awk '{ buffer[NR % 10] = $0; } END { for(i=1;i<11;i++) { print
  29. buffer[i%10] } }' filename
  30. The tac command prints the lines of input file in reverse order:
  31. $ awk '{ buffer[NR] = $0; } END { for(i=NR; i>0; i--) { print buffer[i] }
  32. }' filename
  33. How it works...
  34. In the implementation of head using awk , we print the lines in the input stream having a line
  35. number less than or equal to 10 . The line number is available using the special variable NR .
  36. In the implementation of the tail command a hashing technique is used. The buffer array
  37. index is determined by a hashing function NR % 10 , where NR is the variable that contains the
  38. Linux number of current execution. $0 is the line in the text variable. Hence % maps all the lines
  39. having the same remainder in the hash function to a particular index of an array. In the END{}
  40. block, it can iterate through ten index values of an array and print the lines stored in a buffer.
  41. In the tac command emulation, it simply stores all the lines in an array. When it appears in
  42. the END{} block, NR will be holding the line number of the last line. Then it is decremented in
  43. a for loop until it reaches 1 and it prints the lines stored in each iteration statement.
  44. See also
  45. f Basic awk primer, explains the awk command
  46. f head and tail - printing the last or first 10 lines of Chapter 3, explains the commands
  47. head and tail
  48. f Sorting, unique and duplicates of Chapter 2, explains the uniq command
  49. f Printing lines in reverse order, explains the tac command
  50. www.it-ebooks.info
  51. Chapter 4
  52. 177
  53. Text slicing and parameter operations
  54. This recipe walks through some of the simple text replacement techniques and parameter
  55. expansion short hands available in Bash. A few simple techniques can often help us avoid
  56. having to write multiple lines of code.
  57. How to do it...
  58. Let's get into the tasks.
  59. Replacing some text from a variable can be done as follows:
  60. $ var="This is a line of text"
  61. $ echo ${var/line/REPLACED}
  62. This is a REPLACED of text"
  63. line is replaced with REPLACED .
  64. We can produce a sub-string by specifying the start position and string length, by using the
  65. following syntax:
  66. ${variable_name:start_position:length}
  67. To print from the fifth character onward use the following command:
  68. $ string=abcdefghijklmnopqrstuvwxyz
  69. $ echo ${string:4}
  70. efghijklmnopqrstuvwxyz
  71. To print eight characters starting from the fifth character, use:
  72. $ echo ${string:4:8}
  73. efghijkl
  74. The index is specified by counting the start letter as 0 . We can also specify counting from last
  75. letter as -1 . It is but used inside a parenthesis. (-1) is the index for the last letter.
  76. echo ${string:(-1)}
  77. z
  78. $ echo ${string:(-2):2}
  79. yz
  80. See also
  81. f Iterating through lines, words, and characters in a file, explains slicing of a character
  82. from a word
  83. www.it-ebooks.info
  84. www.it-ebooks.info
  85. 5
  86. Tangled Web?
  87. Not At All!
  88. In this chapter, we will cover:
  89. f Downloading from a web page
  90. f Downloading a web page as formatted plain text
  91. f A primer on cURL
  92. f Accessing unread Gmail mails from the command line
  93. f Parsing data from a website
  94. f Creating an image crawler and downloader
  95. f Creating a web photo album generator
  96. f Building a Twitter command-line client
  97. f Define utility with Web backend
  98. f Finding broken links in a website
  99. f Tracking changes to a website
  100. f Posting to a web page and reading response
  101. www.it-ebooks.info
  102. Tangled Web? Not At All!
  103. 180
  104. Introduction
  105. The Web is becoming the face of technology. It is the central access point for data processing.
  106. Though shell scripting cannot do everything that languages like PHP can do on the Web, there
  107. are still many tasks to which shell scripts are ideally suited. In this chapter we will explore
  108. some recipes that can be used to parse website content, download and obtain data, send
  109. data to forms, and automate website usage tasks and similar activities. We can automate
  110. many activities that we perform interactively through a browser with a few lines of scripting.
  111. Access to the functionalities provided by the HTTP protocol with command-line utilities
  112. enables us to write scripts that are suitable to solve most of the web-automation utilities.
  113. Have fun while going through the recipes of this chapter.
  114. Downloading from a web page
  115. Downloading a file or a web page from a given URL is simple. A few command-line download
  116. utilities are available to perform this task.
  117. Getting ready
  118. wget is a file download command-line utility. It is very flexible and can be configured with
  119. many options.
  120. How to do it...
  121. A web page or a remote file can be downloaded using wget as follows:
  122. $ wget URL
  123. For example:
  124. $ wget http://slynux.org
  125. --2010-08-01 07:51:20-- http://slynux.org/
  126. Resolving slynux.org... 174.37.207.60
  127. Connecting to slynux.org|174.37.207.60|:80... connected.
  128. HTTP request sent, awaiting response... 200 OK
  129. Length: 15280 (15K) [text/html]
  130. Saving to: "index.html"
  131. 100%[======================================>] 15,280 75.3K/s in
  132. 0.2s
  133. 2010-08-01 07:51:21 (75.3 KB/s) - "index.html" saved [15280/15280]
  134. www.it-ebooks.info
  135. Chapter 5
  136. 181
  137. It is also possible to specify multiple download URLs as follows:
  138. $ wget URL1 URL2 URL3 ..
  139. A file can be downloaded using wget using the URL as:
  140. $ wget ftp://example_domain.com/somefile.img
  141. Usually, files are downloaded with the same filename as in the URL and the download log
  142. information or progress is written to stdout .
  143. You can specify the output file name with the -O option. If the file with the specified filename
  144. already exists, it will be truncated first and the downloaded file will be written to the specified
  145. file.
  146. You can also specify a different logfile path rather than printing logs to stdout by using
  147. the -o option as follows:
  148. $ wget ftp://example_domain.com/somefile.img -O dloaded_file.img -o log
  149. By using the above command, nothing will be printed on screen. The log or progress will be
  150. written to log and the output file will be dloaded_file.img .
  151. There is a chance that downloads might break due to unstable Internet connections. Then we
  152. can use the number of tries as an argument so that once interrupted, the utility will retry the
  153. download that many times before giving up.
  154. In order to specify the number of tries, use the -t flag as follows:
  155. $ wget -t 5 URL
  156. There's more...
  157. The wget utility has several additional options that can be used under different problem
  158. domains. Let's go through a few of them.
  159. Restricted with speed downloads
  160. When we have a limited Internet downlink bandwidth and many applications sharing the
  161. internet connection, if a large file is given for download, it will suck all the bandwidth and
  162. may cause other process to starve for bandwidth. The wget command comes with a built-in
  163. option to specify the maximum bandwidth limit the download job can possess. Hence all the
  164. applications can simultaneously run smoothly.
  165. We can restrict the speed of wget by using the --limit-rate argument as follows:
  166. $ wget --limit-rate 20k http://example.com/file.iso
  167. In this command k (kilobyte) and m (megabyte) specify the speed limit.
  168. www.it-ebooks.info
  169. Tangled Web? Not At All!
  170. 182
  171. We can also specify the maximum quota for the download. It will stop when the quota is
  172. exceeded. It is useful when downloading multiple files limited by the total download size. This
  173. is useful to prevent the download from accidently using too much disk space.
  174. Use --quota or –Q as follows:
  175. $ wget -Q 100m http://example.com/file1 http://example.com/file2
  176. Resume downloading and continue
  177. If a download using wget gets interrupted before it is completed, we can resume the
  178. download where we left off by using the -c option as follows:
  179. $ wget -c URL
  180. Using cURL for download
  181. cURL is another advanced command-line utility. It is much more powerful than wget .
  182. cURL can be used to download as follows:
  183. $ curl http://slynux.org > index.html
  184. Unlike wget , curl writes the downloaded data into standard output ( stdout ) rather than to a
  185. file. Therefore, we have to redirect the data from stdout to the file using a redirection operator.
  186. Copying a complete website (mirroring)
  187. wget has an option to download the complete website by recursively collecting all the URL
  188. links in the web pages and downloading all of them like a crawler. Hence we can completely
  189. download all the pages of a website.
  190. In order to download the pages, use the --mirror option as follows:
  191. $ wget --mirror exampledomain.com
  192. Or use:
  193. $ wget -r -N -l DEPTH URL
  194. -l specifies the DEPTH of web pages as levels. That means it will traverse only that much
  195. number of levels. It is used along with –r (recursive). The -N argument is used to enable time
  196. stamping for the file. URL is the base URL for a website for which the download needs to be
  197. initiated.
  198. Accessing pages with HTTP or FTP authentication
  199. Some web pages require authentication for HTTP or FTP URLs. This can be provided by using
  200. the --user and --password arguments:
  201. $ wget –-user username –-password pass URL
  202. www.it-ebooks.info
  203. Chapter 5
  204. 183
  205. It is also possible to ask for a password without specifying the password inline. In order to do
  206. that use --ask-password instead of the --password argument.
  207. Downloading a web page as formatted
  208. plain text
  209. Web pages are HTML pages containing a collection of HTML tags along with other elements,
  210. such as JavaScript, CSS, and so on. But the HTML tags define the base of a web page. We
  211. may need to parse the data in a web page while looking for specific content, and this is
  212. something Bash scripting can help us with. When we download a web page, we receive an
  213. HTML file. In order to view formatted data, it should be viewed in a web browser. However, in
  214. most of the circumstances, parsing a formatted text document will be easier than parsing
  215. HTML data. Therefore, if we can get a text file with formatted text similar to the web page seen
  216. on the web browser, it is more useful and it saves a lot of effort required to strip off HTML
  217. tags. Lynx is an interesting command-line web browser. We can actually get the web page as
  218. plain text formatted output from Lynx. Let's see how to do it.
  219. How to do it...
  220. Let's download the webpage view, in ASCII character representation, in a text file using the
  221. –dump flag with the lynx command:
  222. $ lynx -dump URL > webpage_as_text.txt
  223. This command will also list all the hyper-links ( <a href="link"> ) separately under a
  224. heading References as the footer of the text output. This would help us avoid parsing of links
  225. separately using regular expressions.
  226. For example:
  227. $ lynx -dump http://google.com > plain_text_page.txt
  228. You can see the plain text version of text by using the cat command as follows:
  229. $ cat plain_text_page.txt
  230. A primer on cURL
  231. cURL is a powerful utility that supports many protocols including HTTP, HTTPS, FTP, and much
  232. more. It supports many features including POST, cookie, authentication, downloading partial
  233. files from a specified offset, referers, user agent strings, extra headers, limit speed, maximum
  234. file size, progress bars, and so on. cURL is useful for when we want to play around with
  235. automating a web page usage sequence and to retrieve data. This recipe is a list of the most
  236. important features of cURL.
  237. www.it-ebooks.info
  238. Tangled Web? Not At All!
  239. 184
  240. Getting ready
  241. cURL doesn't come with any of the main Linux distros by default, so you may have to install it
  242. using the package manager. By default, most distributions ship with wget .
  243. cURL usually dumps downloaded files to stdout and progress information to stderr . To
  244. avoid progress information from being shown, we always use the --silent option.
  245. How to do it…
  246. The curl command can be used to perform different activities such as downloading, sending
  247. different HTTP requests, specifying HTTP headers, and so on. Let's see how to perform
  248. different tasks with cURL.
  249. $ curl URL --silent
  250. The above command dumps the downloaded file into the terminal (the downloaded data is
  251. written to stdout ).
  252. The --silent option is used to prevent the curl command from displaying progress
  253. information. If progress information is required, remove --silent .
  254. $ curl URL –-silent -O
  255. The -O option is used to write the downloaded data into a file with the filename parsed from
  256. the URL rather than writing into the standard output.
  257. For example:
  258. $ curl http://slynux.org/index.html --silent -O
  259. index.html will be created.
  260. It writes a web page or file to the filename as in the URL instead of writing to stdout . If
  261. filenames are not there in the URL, it will produce an error. Hence, make sure that the URL
  262. is a URL to a remote file. curl http://slynux.org -O --silent will display an error
  263. since the filename cannot be parsed from the URL.
  264. $ curl URL –-silent -o new_filename
  265. The -o option is used to download a file and write to a file with a specified file name.
  266. In order to show the # progress bar while downloading, use –-progress instead of
  267. –-silent .
  268. $ curl http://slynux.org -o index.html --progress
  269. ################################## 100.0%
  270. www.it-ebooks.info
  271. Chapter 5
  272. 185
  273. There's more...
  274. In the previous sections we have learned how to download files and dump HTML pages to the
  275. terminal. There several advanced options that come along with cURL. Let's explore more
  276. on cURL.
  277. Continue/Resume downloading
  278. cURL has advanced resume download features to continue at a given offset unlike wget . It
  279. helps to download portions of files by specifying an offset.
  280. $ curl URL/file -C offset
  281. The offset is an integer value in bytes.
  282. cURL doesn't require us to know the exact byte offset if we want to resume downloading a file.
  283. If you want cURL to figure out the correct resume point, use the -C - option, like this:
  284. $ curl -C - URL
  285. cURL will automatically figure out where to restart the download of the specified file.
  286. Set referer string with cURL
  287. Referer is a string in the HTTP header used to identify the page from which the user reaches
  288. the current web page. When a user clicks on a link from web page A and it reaches web page
  289. B, the referer header string in the page B will contain a URL of page A.
  290. Some dynamic pages check the referer string before returning HTML data. For example, a web
  291. page shows a Google logo attached page when a user navigates to a website by searching on
  292. Google, and shows a different page when they navigate to the web page by manually typing
  293. the URL.
  294. The web page can write a condition to return a Google page if the referer is www.google.com
  295. or else return a different page.
  296. You can use --referer with the curl command to specify the referer string as follows:
  297. $ curl –-referer Referer_URL target_URL
  298. For example:
  299. $ curl –-referer http://google.com http://slynux.org
  300. Cookies with cURL
  301. Using curl we can specify as well as store cookies encountered during HTTP operations.
  302. In order to specify cookies, use the --cookie "COOKIES" option.
  303. www.it-ebooks.info
  304. Tangled Web? Not At All!
  305. 186
  306. Cookies should be provided as name=value . Multiple cookies should be delimited by a
  307. semicolon ";". For example:
  308. $ curl http://example.com –-cookie "user=slynux;pass=hack"
  309. In order to specify a file to which cookies encountered are to be stored, use the --cookie-
  310. jar option. For example:
  311. $ curl URL –-cookie-jar cookie_file
  312. Setting a user agent string with cURL
  313. Some web pages that check the user-agent won't work if there is no user-agent specified. You
  314. may have noticed that certain websites work well only in Internet Explorer (IE). If a different
  315. browser is used, the website will show a message that it will work only on IE. This is because
  316. the website checks for a user agent. You can set the user agent as IE with curl and see that
  317. it returns a different web page in this case.
  318. Using cURL it can be set using --user-agent or –A as follows:
  319. $ curl URL –-user-agent "Mozilla/5.0"
  320. Additional headers can be passed with cURL. Use –H "Header" to pass multiple additional
  321. headers. For example:
  322. $ curl -H "Host: www.slynux.org" -H "Accept-language: en" URL
  323. Specifying bandwidth limit on cURL
  324. When the available bandwidth is limited and multiple users are sharing the Internet, in order
  325. to perform the sharing of bandwidth smoothly, we can limit the download rate to a specified
  326. limit from curl by using the --limit-rate option as follows:
  327. $ curl URL --limit-rate 20k
  328. In this command k (kilobyte) and m (megabyte) specify the download rate limit.
  329. Specifying the maximum download size
  330. The maximum download file size for cURL can be specified using the --max-filesize
  331. option as follows:
  332. $ curl URL --max-filesize bytes
  333. It will return a non-zero exit code if the file size exceeds. It will return zero if it succeeds.
  334. Authenticating with cURL
  335. HTTP authentication or FTP authentication can be done using cURL with the -u argument.
  336. www.it-ebooks.info
  337. Chapter 5
  338. 187
  339. The username and password can be specified using -u username:password . It is possible
  340. to not provide a password such that it will prompt for password while executing.
  341. If you prefer to be prompted for the password, you can do that by using only -u username .
  342. For example:
  343. $ curl -u user:pass http://test_auth.com
  344. In order to be prompted for the password use:
  345. $ curl -u user http://test_auth.com
  346. Printing response headers excluding data
  347. It is useful to print only response headers to apply many checks or statistics. For example, to
  348. check whether a page is reachable or not, we don't need to download the entire page contents.
  349. Just reading the HTTP response header can be used to identify if a page is available or not.
  350. An example usage case for checking the HTTP header is to check the file size before
  351. downloading. We can check the Content-Length parameter in the HTTP header to find out
  352. the length of a file before downloading. Also, several useful parameters can be retrieved from
  353. the header. The Last-Modified parameter enables to know the last modification time for
  354. the remote file.
  355. Use the –I or –head option with curl to dump only HTTP headers without downloading the
  356. remote file. For example:
  357. $ curl -I http://slynux.org
  358. HTTP/1.1 200 OK
  359. Date: Sun, 01 Aug 2010 05:08:09 GMT
  360. Server: Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_log_bytes/1.2
  361. mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_
  362. ssl/2.8.31 OpenSSL/0.9.7a
  363. Last-Modified: Thu, 19 Jul 2007 09:00:58 GMT
  364. ETag: "17787f3-3bb0-469f284a"
  365. Accept-Ranges: bytes
  366. Content-Length: 15280
  367. Connection: close
  368. Content-Type: text/html
  369. See also
  370. f Posting to a web page and reading response
  371. www.it-ebooks.info
  372. Tangled Web? Not At All!
  373. 188
  374. Accessing Gmail from the command line
  375. Gmail is a widely-used free e-mail service from Google : http://mail.google.com/ .
  376. Gmail allows you to read your mail via authenticated RSS feeds. We can parse the RSS feeds
  377. with the sender's name and an e-mail with subject. It will help to have a look at unread mails
  378. in the inbox without opening the web browser.
  379. How to do it...
  380. Let's go through the shell script to parse the RSS feeds for Gmail to display the unread mails:
  381. #!/bin/bash
  382. Filename: fetch_gmail.sh
  383. #Description: Fetch gmail tool
  384. username="PUT_USERNAME_HERE"
  385. password="PUT_PASSWORD_HERE"
  386. SHOW_COUNT=5 # No of recent unread mails to be shown
  387. echo
  388. curl -u $username:$password --silent "https://mail.google.com/mail/
  389. feed/atom" | \
  390. tr -d '\n' | sed 's:</entry>:\n:g' |\
  391. sed 's/.*<title>\(.*\)<\/title.*<author><name>\([^<]*\)<\/
  392. name><email>
  393. \([^<]*\).*/Author: \2 [\3] \nSubject: \1\n/' | \
  394. head -n $(( $SHOW_COUNT * 3 ))
  395. The output will be as follows:
  396. $ ./fetch_gmail.sh
  397. Author: SLYNUX [ slynux@slynux.com ]
  398. Subject: Book release - 2
  399. Author: SLYNUX [ slynux@slynux.com ]
  400. Subject: Book release - 1
  401. .
  402. … 5 entries
  403. How it works...
  404. The script uses cURL to download the RSS feed by using user authentication. User authentication
  405. is provided by the -u username:password argument. You can use -u user without providing
  406. the password. Then while executing cURL it will interactively ask for the password.
  407. www.it-ebooks.info
  408. Chapter 5
  409. 189
  410. Here we can split the piped commands into different blocks to illustrate how they work.
  411. tr -d '\n' removes the newline character so that we restructure each mail entry with \n
  412. as the delimiter. sed 's:</entry>:\n:g' replaces every </entry> with a newline so that
  413. each mail entry is delimited by a newline and hence mails can be parsed one by one. Have a
  414. look at the source of https://mail.google.com/mail/feed/atom for XML tags used in
  415. the RSS feeds. <entry> TAGS </entry> corresponds to a single mail entry.
  416. The next block of script is as follows:
  417. sed 's/.*<title>\(.*\)<\/title.*<author><name>\([^<]*\)<\/
  418. name><email>
  419. \([^<]*\).*/Author: \2 [\3] \nSubject: \1\n/'
  420. This script matches the substring title using <title>\(.*\)<\/title , the sender name
  421. using <author><name>\([^<]*\)<\/name> , and e-mail using <email>\([^<]*\) . Then
  422. back referencing is used as follows:
  423. f Author: \2 [\3] \nSubject: \1\n is used to replace an entry for a mail with
  424. the matched items in an easy-to-read format. \1 corresponds to the first substring
  425. match, \2 for the second substring match, and so on.
  426. f The SHOW_COUNT=5 variable is used to take the number of unread mail entries to be
  427. printed on terminal.
  428. f head is used to display only SHOW_COUNT*3 lines from the first line. SHOW_COUNT is
  429. used three times in order to show three lines of the output.
  430. See also
  431. f A primer on cURL, explains the curl command
  432. f Basic sed primer of Chapter 4, explains the sed command
  433. Parsing data from a website
  434. It is often useful to parse data from web pages by eliminating unnecessary details. sed and awk
  435. are the main tools that we will use for this task. You might have come across a list of access
  436. rankings in a grep recipe in the previous chapter Texting and driving; it was generated by parsing
  437. the website page http://www.johntorres.net/BoxOfficefemaleList.html .
  438. Let's see how to parse the same data using text-processing tools.
  439. www.it-ebooks.info
  440. Tangled Web? Not At All!
  441. 190
  442. How to do it...
  443. Let's go through the command sequence used to parse details of actresses from the website:
  444. $ lynx -dump http://www.johntorres.net/BoxOfficefemaleList.html | \ grep
  445. -o "Rank-.*" | \
  446. sed 's/Rank-//; s/\[[0-9]\+\]//' | \
  447. sort -nk 1 |\
  448. awk '
  449. {
  450. for(i=3;i<=NF;i++){ $2=$2" "$i }
  451. printf "%-4s %s\n", $1,$2 ;
  452. }' > actresslist.txt
  453. The output will be as follows:
  454. # Only 3 entries shown. All others omitted due to space limits
  455. 1 Keira Knightley
  456. 2 Natalie Portman
  457. 3 Monica Bellucci
  458. How it works...
  459. Lynx is a command-line web browser; it can dump the text version of the website as we
  460. would see in a web browser rather than showing us the raw code. Hence it avoids the job of
  461. removing the HTML tags. We parse the lines starting with Rank, using sed as follows:
  462. sed 's/Rank-//; s/\[[0-9]\+\]//'
  463. These lines could be then sorted according to the ranks. awk is used here to keep the spacing
  464. between rank and the name uniform by specifying the width. %-4s specifies a four-character
  465. width. All the fields except the first field are concatenated to form a single string as $2 .
  466. See also
  467. f Basic sed primer of Chapter 4, explains the sed command
  468. f Basic awk primer of Chapter 4, explains the awk command
  469. f Downloading a web page as formatted plain text, explains the lynx command
  470. www.it-ebooks.info
  471. Chapter 5
  472. 191
  473. Image crawler and downloader
  474. Image crawlers are very useful when we need to download all the images that appear in a web
  475. page. Instead of going through the HTML sources and picking all the images, we can use a
  476. script to parse the image files and download them automatically. Let's see how to do it.
  477. How to do it...
  478. Let's write a Bash script to crawl and download the images from a web page as follows:
  479. #!/bin/bash
  480. #Description: Images downloader
  481. #Filename: img_downloader.sh
  482. if [ $# -ne 3 ];
  483. then
  484. echo "Usage: $0 URL -d DIRECTORY"
  485. exit -1
  486. fi
  487. for i in {1..4}
  488. do
  489. case $1 in
  490. -d) shift; directory=$1; shift ;;
  491. *) url=${url:-$1}; shift;;
  492. esac
  493. done
  494. mkdir -p $directory;
  495. baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
  496. curl –s $url | egrep -o "<img src=[^>]*>" |
  497. sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
  498. sed -i "s|^/|$baseurl/|" /tmp/$$.list
  499. cd $directory;
  500. while read filename;
  501. do
  502. curl –s -O "$filename" --silent
  503. done < /tmp/$$.list
  504. An example usage is as follows:
  505. $ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images
  506. www.it-ebooks.info
  507. Tangled Web? Not At All!
  508. 192
  509. How it works...
  510. The above image downloader script parses an HTML page, strips out all tags except <img> ,
  511. then parses src="URL" from the <img> tag and downloads them to the specified directory.
  512. This script accepts a web page URL and the destination directory path as command-line
  513. arguments. The first part of the script is a tricky way to parse command-line arguments.
  514. The [ $# -ne 3 ] statement checks whether the total number of arguments to the script
  515. is three, else it exits and returns a usage example.
  516. If it is 3 arguments, then parse the URL and the destination directory. In order to do that a
  517. tricky hack is used:
  518. for i in {1..4}
  519. do
  520. case $1 in
  521. -d) shift; directory=$1; shift ;;
  522. *) url=${url:-$1}; shift;;
  523. esac
  524. done
  525. A for loop is iterated four times (there is no significance to the number four, it is just to iterate
  526. a couple of times to run the case statement).
  527. The case statement will evaluate the first argument ( $1 ), and matches -d or any other
  528. string arguments that are checked. We can place the -d argument anywhere in the format as
  529. follows:
  530. $ ./img_downloader.sh -d DIR URL
  531. Or:
  532. $ ./img_downloader.sh URL -d DIR
  533. shift is used to shift arguments such that when shift is called $1 will be assigned with
  534. $2 , when again called $1=$3 and so on as it shifts $1 to the next arguments. Hence we can
  535. evaluate all arguments through $1 itself.
  536. When -d is matched ( -d) ), it is obvious that the next argument is the value for the
  537. destination directory. *) corresponds to default match. It will match anything other than
  538. -d . Hence while iteration $1="" or $1=URL in the default match, we need to take $1=URL
  539. avoiding "" to overwrite. Hence we use the url=${url:-$1} trick. It will return a URL value
  540. if already not "" else it will assign $1 .
  541. egrep -o "<img src=[^>]*>" will print only the matching strings, which are the <img>
  542. tags including their attributes. [^>]* used to match all characters except the closing > , that
  543. is, <img src="image.jpg" …. > .
  544. www.it-ebooks.info
  545. Chapter 5
  546. 193
  547. sed 's/<img src=\"\([^"]*\).*/\1/g' parses src="url" so that all image URLs
  548. can be parsed from the <img> tags already parsed.
  549. There are two types of image source paths: relative and absolute. Absolute paths contain full
  550. URLs that start with http:// or https:// . Relative URLs starts with / or image_name itself.
  551. An example of an absolute URL is: http://example.com/image.jpg
  552. An example of a relative URL is: /image.jpg
  553. For relative URLs the starting / should be replaced with the base URL to transform it to
  554. http://example.com/image.jpg .
  555. For that transformation, we initially find out baseurl sed by parsing.
  556. Then replace every occurrence of the starting / with baseurl sed as sed -i
  557. "s|^/|$baseurl/|" /tmp/$$.list .
  558. Then a while loop is used to iterate the list line by line and download the URL using curl .
  559. The --silent argument is used with curl to avoid other progress messages from being
  560. printed on the screen.
  561. See also
  562. f A primer on cURL, explains the curl command
  563. f Basic sed primer of Chapter 4, explains the sed command
  564. f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
  565. command
  566. Web photo album generator
  567. Web developers commonly design photo album pages for websites that consist of a number
  568. of image thumbnails on the page. When thumbnails are clicked, a large version of the
  569. picture will be displayed. But when many images are required, copying the <img> tag every
  570. time, resizing the image to create a thumbnail, placing them in the thumbs directory, testing
  571. the links, and so on are real hurdles. It takes a lot of time and repeats the same task. It
  572. can be automated easily by writing a simple Bash script. By writing a script, we can create
  573. thumbnails, place them in exact directories, and generate the code fragment for <img> tags
  574. automatically in few seconds. This recipe will teach you how to do it.
  575. Getting ready
  576. We can perform this task with a for loop that iterates every image in the current directory.
  577. The usual Bash utilities such as cat and convert (image magick) are used. These will
  578. generate an HTML album, using all the images, to index.html . In order to use convert ,
  579. make sure you have Imagemagick installed.
  580. www.it-ebooks.info
  581. Tangled Web? Not At All!
  582. 194
  583. How to do it...
  584. Let's write a Bash script to generate a HTML album page:
  585. #!/bin/bash
  586. #Filename: generate_album.sh
  587. #Description: Create a photo album using images in current directory
  588. echo "Creating album.."
  589. mkdir -p thumbs
  590. cat <<EOF > index.html
  591. <html>
  592. <head>
  593. <style>
  594. body
  595. {
  596. width:470px;
  597. margin:auto;
  598. border: 1px dashed grey;
  599. padding:10px;
  600. }
  601. img
  602. {
  603. margin:5px;
  604. border: 1px solid black;
  605. }
  606. </style>
  607. </head>
  608. <body>
  609. <center><h1> #Album title </h1></center>
  610. <p>
  611. EOF
  612. for img in *.jpg;
  613. do
  614. convert "$img" -resize "100x" "thumbs/$img"
  615. echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" />
  616. </a>" >> index.html
  617. done
  618. cat <<EOF >> index.html
  619. </p>
  620. </body>
  621. </html>
  622. EOF
  623. echo Album generated to index.html
  624. www.it-ebooks.info
  625. Chapter 5
  626. 195
  627. Run the script as follows:
  628. $ ./generate_album.sh
  629. Creating album..
  630. Album generated to index.html
  631. How it works...
  632. The initial part of the script is to write the header part of the HTML page.
  633. The following script redirects all the contents up to EOF (excluding) to the index.html :
  634. cat <<EOF > index.html
  635. contents...
  636. EOF
  637. The header includes the HTML and stylesheets.
  638. for img in *.jpg; will iterate through names of each file and will perform actions.
  639. convert "$img" -resize "100x" "thumbs/$img" will create images of 100px width
  640. as thumbnails.
  641. The following statement will generate the required <img> tag and appends it to the index.html :
  642. echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" /></
  643. a>" >> index.html
  644. Finally, the footer HTML tags are appended with cat again.
  645. See also
  646. f Playing with file descriptors and redirection of Chapter 1, explains EOF and stdin
  647. redirection.
  648. Twitter command-line client
  649. Twitter is the hottest micro blogging platform as well as the latest buzz of online social media.
  650. Tweeting and reading tweets is fun. What if we can do both from command line? It is pretty
  651. simple to write a command-line Twitter client. Twitter has RSS feeds and hence we can make
  652. use of them. Let's see how to do it.
  653. Getting ready
  654. We can use cURL to authenticate and send twitter updates as well as download the RSS feed
  655. pages to parse the tweets. Just four lines of code can do it. Let's do it.
  656. www.it-ebooks.info
  657. Tangled Web? Not At All!
  658. 196
  659. How to do it...
  660. Let's write a Bash script using the curl command to manipulate twitter APIs:
  661. #!/bin/bash
  662. #Filename: tweets.sh
  663. #Description: Basic twitter client
  664. USERNAME="PUT_USERNAME_HERE"
  665. PASSWORD="PUT_PASSWORD_HERE"
  666. COUNT="PUT_NO_OF_TWEETS"
  667. if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];
  668. then
  669. echo -e "Usage: $0 send status_message\n OR\n $0 read\n"
  670. exit -1;
  671. fi
  672. if [[ "$1" = "read" ]];
  673. then
  674. curl --silent -u $USERNAME:$PASSWORD http://twitter.com/statuses/
  675. friends_timeline.rss | \
  676. grep title | \
  677. tail -n +2 | \
  678. head -n $COUNT | \
  679. sed 's:.*<title>\([^<]*\).*:\n\1:'
  680. elif [[ "$1" = "tweet" ]];
  681. then
  682. status=$( echo $@ | tr -d '"' | sed 's/.*tweet //')
  683. curl --silent -u $USERNAME:$PASSWORD -d status="$status" http://
  684. twitter.com/statuses/update.xml > /dev/null
  685. echo 'Tweeted :)'
  686. fi
  687. Run the script as follows:
  688. $ ./tweets.sh tweet Thinking of writing a X version of wall command
  689. "#bash"
  690. Tweeted :)
  691. $ ./tweets.sh read
  692. bot: A tweet line
  693. t3rm1n4l: Thinking of writing a X version of wall command #bash
  694. www.it-ebooks.info
  695. Chapter 5
  696. 197
  697. How it works...
  698. Let's see the working of above script by splitting it into two parts. The first part is
  699. about reading tweets. To read tweets the script downloads the RSS information from
  700. http://twitter.com/statuses/friends_timeline.rss and parses the lines
  701. containing the <title> tag. Then it strips off the <title> and </title> tags using sed
  702. to form the required tweet text. Then a COUNT variable is used to remove all other text except
  703. the number of recent tweets by using the head command. tail –n +2 is used to remove an
  704. unnecessary header text "Twitter: Timeline of friends".
  705. In the sending tweet part, the -d status argument of curl is used to post data to Twitter
  706. using their API: http://twitter.com/statuses/update.xml .
  707. $1 of the script will be the tweet in the case of sending a tweet. Then to obtain the status we
  708. take $@ (list of all arguments of the script) and remove the word "tweet" from it.
  709. See also
  710. f A primer on cURL, explains the curl command
  711. f head and tail - printing the last or first 10 lines of Chapter 3, explains the commands
  712. head and tail
  713. define utility with Web backend
  714. Google provides Web definitions for any word by using the search query define:WORD . We
  715. need a GUI web browser to fetch the definitions. However, we can automate it and parse the
  716. required definitions by using a script. Let's see how to do it.
  717. Getting ready
  718. We can use lynx , sed , awk , and grep to write the define utility.
  719. How to do it...
  720. Let's go through the code for the define utility script to fetch definitions from Google search:
  721. #!/bin/bash
  722. #Filename: define.sh
  723. #Description: A Google define: frontend
  724. limit=0
  725. if [ ! $# -ge 1 ];
  726. then
  727. echo -e "Usage: $0 WORD [-n No_of_definitions]\n"
  728. exit -1;
  729. www.it-ebooks.info
  730. Tangled Web? Not At All!
  731. 198
  732. fi
  733. if [ "$2" = "-n" ];
  734. then
  735. limit=$3;
  736. let limit++
  737. fi
  738. word=$1
  739. lynx -dump http://www.google.co.in/search?q=define:$word | \
  740. awk '/Defini/,/Find defini/' | head -n -1 | sed 's:*:\n*:; s:^[ ]*::'
  741. | \
  742. grep -v "[[0-9]]" | \
  743. awk '{
  744. if ( substr($0,1,1) == "*" )
  745. { sub("*",++count".") } ;
  746. print
  747. } ' > /tmp/$$.txt
  748. echo
  749. if [ $limit -ge 1 ];
  750. then
  751. cat /tmp/$$.txt | sed -n "/^1\./, /${limit}/p" | head -n -1
  752. else
  753. cat /tmp/$$.txt;
  754. fi
  755. Run the script as follows:
  756. $ ./define.sh hack -n 2
  757. 1. chop: cut with a hacking tool
  758. 2. one who works hard at boring tasks
  759. How it works...
  760. We will look into the core part of the definition parser. Lynx is used to obtain the plain text
  761. version of the web page. http://www.google.co.in/search?q=define:$word is
  762. the URL for the web definition web page. Then we reduce the text between "Definitions on
  763. web" and "Find definitions". All the definitions are occurring in between these lines of text
  764. ( awk '/Defini/,/Find defini/' ).
  765. www.it-ebooks.info
  766. Chapter 5
  767. 199
  768. 's:*:\n*:' is used to replace * with * and newline in order to insert a newline in between
  769. each definition, and s:^[ ]*:: is used to remove extra spaces in the start of lines. Hyperlinks
  770. are marked as [number] in lynx output. Those lines are removed by grep -v , the invert match
  771. lines option. Then awk is used to replace the * occurring at start of the line with a number so
  772. that each definition can assign a serial number. If we have read a -n count in the script, it has to
  773. output only a few definitions as per count. So awk is used to print the definitions with number 1
  774. to count (this makes it easier since we replaced * with the serial number).
  775. See also
  776. f Basic sed primer of Chapter 4, explains the sed command
  777. f Basic awk primer of Chapter 4, explains the awk command
  778. f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
  779. command
  780. f Downloading a web page as formatted plain text, explains the lynx command
  781. Finding broken links in a website
  782. I have seen people manually checking each and every page on a site to search for broken links.
  783. It is possible only for websites having very few pages. When the number of pages become large,
  784. it will become impossible. It becomes really easy if we can automate finding broken links. We
  785. can find the broken links by using HTTP manipulation tools. Let's see how to do it.
  786. Getting ready
  787. In order to identify the links and find the broken ones from the links, we can use lynx and
  788. curl . It has an option -traversal , which will recursively visit pages in the website and build
  789. the list of all hyperlinks in the website. We can use cURL to verify whether each of the links
  790. are broken or not.
  791. How to do it...
  792. Let's write a Bash script with the help of the curl command to find out the broken links on a
  793. web page:
  794. #!/bin/bash
  795. #Filename: find_broken.sh
  796. #Description: Find broken links in a website
  797. if [ $# -eq 2 ];
  798. then
  799. echo -e "$Usage $0 URL\n"
  800. exit -1;
  801. fi
  802. www.it-ebooks.info
  803. Tangled Web? Not At All!
  804. 200
  805. echo Broken links:
  806. mkdir /tmp/$$.lynx
  807. cd /tmp/$$.lynx
  808. lynx -traversal $1 > /dev/null
  809. count=0;
  810. sort -u reject.dat > links.txt
  811. while read link;
  812. do
  813. output=`curl -I $link -s | grep "HTTP/.*OK"`;
  814. if [[ -z $output ]];
  815. then
  816. echo $link;
  817. let count++
  818. fi
  819. done < links.txt
  820. [ $count -eq 0 ] && echo No broken links found.
  821. How it works...
  822. lynx -traversal URL will produce a number of files in the working directory. It includes
  823. a file reject.dat which will contain all the links in the website. sort -u is used to build a
  824. list by avoiding duplicates. Then we iterate through each link and check the header response
  825. by using curl -I . If the header contains first line HTTP/1.0 200 OK as the response, it
  826. means that the target is not broken. All other responses correspond to broken links and are
  827. printed out to stdout .
  828. See also
  829. f Downloading a web page as formatted plain text, explains the lynx command
  830. f A primer on cURL, explains the curl command
  831. Tracking changes to a website
  832. Tracking changes to a website is helpful to web developers and users. Checking a website
  833. manually in intervals is really hard and impractical. Hence we can write a change tracker
  834. running at repeated intervals. When a change occurs, it can play a sound or send a
  835. notification. Let's see how to write a basic tracker for the website changes.
  836. www.it-ebooks.info
  837. Chapter 5
  838. 201
  839. Getting ready
  840. Tracking changes in terms of Bash scripting means fetching websites at different times and
  841. taking the difference using the diff command. We can use curl and diff to do this.
  842. How to do it...
  843. Let's write a Bash script by combining different commands to track changes in a web page:
  844. #!/bin/bash
  845. #Filename: change_track.sh
  846. #Desc: Script to track changes to webpage
  847. if [ $# -eq 2 ];
  848. then
  849. echo -e "$Usage $0 URL\n"
  850. exit -1;
  851. fi
  852. first_time=0
  853. # Not first time
  854. if [ ! -e "last.html" ];
  855. then
  856. first_time=1
  857. # Set it is first time run
  858. fi
  859. curl --silent $1 -o recent.html
  860. if [ $first_time -ne 1 ];
  861. then
  862. changes=$(diff -u last.html recent.html)
  863. if [ -n "$changes" ];
  864. then
  865. echo -e "Changes:\n"
  866. echo "$changes"
  867. else
  868. echo -e "\nWebsite has no changes"
  869. fi
  870. else
  871. echo "[First run] Archiving.."
  872. fi
  873. cp recent.html last.html
  874. www.it-ebooks.info
  875. Tangled Web? Not At All!
  876. 202
  877. Let's look at the output of the track_changes.sh script when changes are made to the web
  878. page and when the changes are not made to the page:
  879. f First run:
  880. $ ./track_changes.sh http://web.sarathlakshman.info/test.html
  881. [First run] Archiving..
  882. f Second Run:
  883. $ ./track_changes.sh http://web.sarathlakshman.info/test.html
  884. Website has no changes
  885. f Third run after making changes to the web page:
  886. $ ./test.sh http://web.sarathlakshman.info/test_change/test.html
  887. Changes:
  888. --- last.html 2010-08-01 07:29:15.000000000 +0200
  889. +++ recent.html 2010-08-01 07:29:43.000000000 +0200
  890. @@ -1,3 +1,4 @@
  891. <html>
  892. +added line :)
  893. <p>data</p>
  894. </html>
  895. How it works...
  896. The script checks whether the script is running for the first time using [ ! -e "last.html"
  897. ]; . If last.html doesn't exist, that means it is the first time and hence the webpage must
  898. be downloaded and copied as last.html .
  899. If it is not the first time, it should download the new copy ( recent.html ) and check the
  900. difference using the diff utility. If changes are there, it should print the changes and finally it
  901. should copy recent.html to last.html .
  902. See also
  903. f A primer on cURL, explains the curl command
  904. www.it-ebooks.info
  905. Chapter 5
  906. 203
  907. Posting to a web page and reading response
  908. POST and GET are two types of requests in HTTP to send information to or retrieve information
  909. from a website. In a GET request, we send parameters (name-value pairs) through the web
  910. page URL itself. In the case of POST, it won't be attached with the URL. POST is used when a
  911. form needs to be submitted. For example, a username, the password to be submitted, and the
  912. login page to be retrieved.
  913. POSTing to pages comes as frequent use while writing scripts based on web page retrievals.
  914. Let's see how to work with POST. Automating the HTTP GET and POST request by sending
  915. POST data and retrieving output is a very important task that we practice while writing shell
  916. scripts that parse data from websites.
  917. Getting ready
  918. Both cURL and wget can handle POST requests by arguments. They are to be passed as
  919. name-value pairs.
  920. How to do it...
  921. Let's see how to POST and read HTML response from a real website using curl :
  922. $ curl URL -d "postvar=postdata2&postvar2=postdata2"
  923. We have a website ( http://book.sarathlakshman.com/lsc/mlogs/ ) and it is used
  924. to submit the current user information such as hostname and username. Assume that, in
  925. the home page of the website there are two fields HOSTNAME and USER, and a SUBMIT
  926. button. When the user enters a hostname, a user name, and clicks on the SUBMIT button,
  927. the details will be stored in the website. This process can be automated using a single line of
  928. curl command by automating the POST request. If you look at the website source (use the
  929. view source option from the web browser), you can see an HTML form defined similar to the
  930. following code:
  931. <form action="http://book.sarathlakshman.com/lsc/mlogs/submit.php"
  932. method="post" >
  933. <input type="text" name="host" value="HOSTNAME" >
  934. <input type="text" name="user" value="USER" >
  935. <input type="submit" >
  936. </form>
  937. Here, http://book.sarathlakshman.com/lsc/mlogs/submit.php is the target
  938. URL. When the user enters the details and clicks on the Submit button. The host and user
  939. inputs are sent to submit.php as a POST request and the response page is returned on the
  940. browser.
  941. www.it-ebooks.info
  942. Tangled Web? Not At All!
  943. 204
  944. We can automate the POST request as follows:
  945. $ curl http://book.sarathlakshman.com/lsc/mlogs/submit.php -d "host=test-
  946. host&user=slynux"
  947. <html>
  948. You have entered :
  949. <p>HOST : test-host</p>
  950. <p>USER : slynux</p>
  951. <html>
  952. Now curl returns the response page.
  953. -d is the argument used for posting. The string argument for -d is similar to the GET request
  954. semantics. var=value pairs are to be delimited by & .
  955. The -d argument should always be given in quotes. If quotes are not used, &
  956. is interpreted by the shell to indicate this should be a background process.
  957. There's more
  958. Let's see how to perform POST using cURL and wget .
  959. POST in curl
  960. You can POST data in curl by using -d or –data as follows:
  961. $ curl –-data "name=value" URL -o output.html
  962. If multiple variables are to be sent, delimit them with & . Note that when & is used the
  963. name-value pairs should be enclosed in quotes, else the shell will consider & as a special
  964. character for background process. For example:
  965. $ curl -d "name1=val1&name2=val2" URL -o output.html
  966. POST data using wget
  967. You can POST data using wget by using -–post-data "string" . For example:
  968. $ wget URL –post-data "name=value" -O output.html
  969. Use the same format as cURL for name-value pairs.
  970. See also
  971. f A primer on cURL, explains the curl command
  972. f Downloading from a web page explains the wget command
  973. www.it-ebooks.info
  974. 6
  975. The Backup Plan
  976. In this chapter, we will cover:
  977. f Archiving with tar
  978. f Archiving with cpio
  979. f Compressing with gunzip (gzip)
  980. f Compressing with bunzip (bzip)
  981. f Compressing with lzma
  982. f Archiving and compressing with zip
  983. f Heavy compression squashfs fileystem
  984. f Encrypting files and folders (with standard algorithms)
  985. f Backup snapshots with rsync
  986. f Version controlled backups with git
  987. f Cloning disks with dd
  988. Introduction
  989. Taking snapshots and backups of data are regular tasks we come across. When it comes
  990. to a server or large data storage systems, regular backups are important. It is possible
  991. to automate backups via shell scripting. Archiving and compression seems to find usage
  992. in the everyday life of a system admin or a regular user. There are various compression
  993. formats that can be used in various ways so that best results can be obtained. Encryption is
  994. another task that comes under frequent usage for protection of data. In order to reduce the
  995. size of encrypted data, usually files are archived and compressed before encrypting. Many
  996. standard encryption algorithms are available and it can be handled with shell utilities. This
  997. chapter walks through different recipes for creating and maintaining files or folder archives,
  998. compression formats, and encrypting techniques with shell. Let's go through the recipes.
  999. www.it-ebooks.info
  1000. The Backup Plan
  1001. 206
  1002. Archiving with tar
  1003. The tar command can be used to archive files. It was originally designed for storing data on
  1004. tape archives (tar). It allows you to store multiple files and directories as a single file. It can
  1005. retain all the file attributes, such as owner, permissions, and so on. The file created by the tar
  1006. command is often referred to as a tarball.
  1007. Getting ready
  1008. The tar command comes by default with all UNIX like operating systems. It has a simple
  1009. syntax and is a portable file format. Let's see how to do it.
  1010. tar has got a list of arguments: A , c , d , r , t , u , x , f , and v . Each of these letters can be used
  1011. independently for different purposes corresponding to it.
  1012. How to do it...
  1013. To archive files with tar, use the following syntax:
  1014. $ tar -cf output.tar [SOURCES]
  1015. For example:
  1016. $ tar -cf output.tar file1 file2 file3 folder1 ..
  1017. In this command, -c stands for "create file" and –f stands for "specify filename".
  1018. We can specify folders and filenames as SOURCES . We can use a list of file names or
  1019. wildcards such as *.txt to specify the sources.
  1020. It will archive the source files into a file called output.tar .
  1021. The filename must appear immediately after the –f and should be the last option in the
  1022. argument group (for example, -cvvf filename.tar and -tvvf filename.tar ).
  1023. We cannot pass hundreds of files or folders as command-line arguments because there is a
  1024. limit. So it is safer to use the append option if many files are to be archived.
  1025. There's more...
  1026. Let's go through additional features that are available with the tar command.
  1027. Appending files to an archive
  1028. Sometimes we may need to add files to an archive that already exists (an example usage is
  1029. when thousands of files are to be archived and when they cannot be specified in one line as
  1030. command-line arguments).
  1031. www.it-ebooks.info
  1032. Chapter 6
  1033. 207
  1034. Append option: -r
  1035. In order to append a file into an already existing archive use:
  1036. $ tar -rvf original.tar new_file
  1037. List the files in an archive as follows:
  1038. $ tar -tf archive.tar
  1039. yy/lib64/
  1040. yy/lib64/libfakeroot/
  1041. yy/sbin/
  1042. In order to print more details while archiving or listing, use the -v or the –vv flag. These flags
  1043. are called verbose ( v ), which will enable to print more details on the terminal. For example,
  1044. by using verbose you could print more details, such as the file permissions, owner group,
  1045. modification date, and so on.
  1046. For example:
  1047. $ tar -tvvf archive.tar
  1048. drwxr-xr-x slynux/slynux 0 2010-08-06 09:31 yy/
  1049. drwxr-xr-x slynux/slynux 0 2010-08-06 09:39 yy/usr/
  1050. drwxr-xr-x slynux/slynux 0 2010-08-06 09:31 yy/usr/lib64/
  1051. Extracting files and folders from an archive
  1052. The following command extracts the contents of the archive to the current directory:
  1053. $ tar -xf archive.tar
  1054. The -x option stands for extract.
  1055. When –x is used, the tar command extracts the contents of the archive to the current
  1056. directory. We can also specify the directory where the files need to be extracted by using the
  1057. –C flag, as follows:
  1058. $ tar -xf archive.tar -C /path/to/extraction_directory
  1059. The command extracts the contents of an archive to insert image a specified directory. It
  1060. extracts the entire contents of the archive. We can also extract only a few files by specifying
  1061. them as command arguments:
  1062. $ tar -xvf file.tar file1 file4
  1063. The command above extracts only file1 and file4 , and ignores other files in the archive.
  1064. www.it-ebooks.info
  1065. The Backup Plan
  1066. 208
  1067. stdin and stdout with tar
  1068. While archiving, we can specify stdout as the output file so that another command appearing
  1069. through a pipe can read it as stdin and then do some process or extract the archive.
  1070. This is helpful in order to transfer data through a Secure Shell (SSH) connection (while on a
  1071. network). For example:
  1072. $ mkdir ~/destination
  1073. $ tar -cf - file1 file2 file3 | tar -xvf - -C ~/destination
  1074. In the example above, file1 , file2 , and file3 are combined into a tarball and then
  1075. extracted to ~/destination . In this command:
  1076. f -f specifies stdout as the file for archiving (when the -c option used)
  1077. f -f specifies stdin as the file for extracting (when the -x option used)
  1078. Concatenating two archives
  1079. We can easily merge multiple tar files with the -A option.
  1080. Let's pretend we have two tarballs: file1.tar and file2.tar . We can merge the contents
  1081. of file2.tar to file1.tar as follows:
  1082. $ tar -Af file1.tar file2.tar
  1083. Verify it by listing the contents:
  1084. $ tar -tvf file1.tar
  1085. Updating files in an archive with timestamp check
  1086. The append option appends any given file to the archive. If the same file is inside the archive
  1087. is given to append, it will append that file and the archive will contain duplicates. We can
  1088. use the update option -u to specify only append files that are newer than the file inside the
  1089. archive with the same name.
  1090. $ tar -tf archive.tar
  1091. filea
  1092. fileb
  1093. filec
  1094. This command lists the files in the archive.
  1095. In order to append filea only if filea has newer modification time than filea inside
  1096. archive.tar , use:
  1097. $ tar -uvvf archive.tar filea
  1098. www.it-ebooks.info
  1099. Chapter 6
  1100. 209
  1101. Nothing happens if the version of filea outside the archive and the filea inside
  1102. archive.tar have the same timestamp.
  1103. Use the touch command to modify the file timestamp and then try the tar command again:
  1104. $ tar -uvvf archive.tar filea
  1105. -rw-r--r-- slynux/slynux 0 2010-08-14 17:53 filea
  1106. The file is appended since its timestamp is newer than the one inside the archive.
  1107. Comparing files in archive and file system
  1108. Sometimes it is useful to know whether a file in the archive and a file with the same filename
  1109. in the filesystem are the same or contain any differences. The –d flag can be used to print the
  1110. differences:
  1111. $ tar -df archive.tar filename1 filename2 ...
  1112. For example:
  1113. $ tar -df archive.tar afile bfile
  1114. afile: Mod time differs
  1115. afile: Size differs
  1116. Deleting files from archive
  1117. We can remove files from a given archive using the –delete option. For example:
  1118. $ tar -f archive.tar --delete file1 file2 ..
  1119. Let's see another example:
  1120. $ tar -tf archive.tar
  1121. filea
  1122. fileb
  1123. filec
  1124. Or, we can also use the following syntax:
  1125. $ tar --delete --file archive.tar [FILE LIST]
  1126. For example:
  1127. $ tar --delete --file archive.tar filea
  1128. $ tar -tf archive.tar
  1129. fileb
  1130. filec
  1131. www.it-ebooks.info
  1132. The Backup Plan
  1133. 210
  1134. Compression with tar archive
  1135. The tar command only archives files, it does not compress them. For this reason, most people
  1136. usually add some form of compression when working with tarballs. This significantly decreases
  1137. the size of the files. Tarballs are often compressed into one of the following formats:
  1138. f file.tar.gz
  1139. f file.tar.bz2
  1140. f file.tar.lzma
  1141. f file.tar.lzo
  1142. Different tar flags are used to specify different compression formats.
  1143. f -j for bunzip2
  1144. f -z for gzip
  1145. f --lzma for lzma
  1146. They are explained in the following compression-specific recipes.
  1147. It is possible to use compression formats without explicitly specifying special options as
  1148. above. tar can compress by looking at the given extension of the output or input file names.
  1149. In order for tar to support compression automatically by looking at the extensions, use -a or
  1150. --auto-compress with tar .
  1151. Excluding a set of files from archiving
  1152. It is possible to exclude a set of files from archiving by specifying patterns. Use
  1153. --exclude [PATTERN] for excluding files matched by wildcard patterns.
  1154. For example, to exclude all .txt files from archiving use:
  1155. $ tar -cf arch.tar * --exclude "*.txt"
  1156. Note that the pattern should be enclosed in double quotes.
  1157. It is also possible to exclude a list of files provided in a list file with the -X flag as follows:
  1158. $ cat list
  1159. filea
  1160. fileb
  1161. $ tar -cf arch.tar * -X list
  1162. Now it excludes filea and fileb from archiving.
  1163. www.it-ebooks.info
  1164. Chapter 6
  1165. 211
  1166. Excluding version control directories
  1167. We usually use tarballs for distributing source code. Most of the source code is maintained
  1168. using version control systems such as subversion, Git, mercurial, cvs, and so on. Code
  1169. directories under version control will contain special directories used to manage versions like
  1170. .svn or .git . However, these directories aren't needed by the code itself and so should be
  1171. eliminated from the tarball of the source code.
  1172. In order to exclude version control related files and directories while archiving use the
  1173. --exclude-vcs option along with tar . For example:
  1174. $ tar --exclude-vcs -czvvf source_code.tar.gz eye_of_gnome_svn
  1175. Printing total bytes
  1176. It is sometimes useful if we can print total bytes copied to the archive. Print the total bytes
  1177. copied after archiving by using the -- totals option as follows:
  1178. $ tar -cf arc.tar * --exclude "*.txt" --totals
  1179. Total bytes written: 20480 (20KiB, 12MiB/s)
  1180. See also
  1181. f Compressing with gunzip (gzip), explains the gzip command
  1182. f Compressing with bunzip (bzip2), explains the bzip2 command
  1183. f Compressing with lzma, explains the lzma command
  1184. Archiving with cpio
  1185. cpio is another archiving format similar to tar . It is used to store files and directories in a file
  1186. with attributes such as permissions, ownership, and so on. But it is not commonly used as
  1187. much as tar . However, cpio seems to be used in RPM package archives, initramfs files for
  1188. the Linux kernel, and so on. This recipe will give minimal usage examples of cpio .
  1189. How to do it...
  1190. cpio takes input filenames through stdin and it writes the archive into stdout . We have to
  1191. redirect stdout to a file to receive the output cpio file as follows:
  1192. Create test files:
  1193. $ touch file1 file2 file3
  1194. We can archive the test files as follows:
  1195. $ echo file1 file2 file3 | cpio -ov > archive.cpio
  1196. www.it-ebooks.info
  1197. The Backup Plan
  1198. 212
  1199. In this command:
  1200. f -o specifies the output
  1201. f -v is used for printing a list of files archived
  1202. By using cpio, we can also archive using files as absolute paths. /usr/
  1203. somedir is an absolute path as it contains the full path starting from root (/).
  1204. A relative path will not start with / but it starts the path from the current
  1205. directory. For example, test/file means that there is a directory test and
  1206. the file is inside the test directory.
  1207. While extracting, cpio extracts to the absolute path itself. But incase of tar it
  1208. removes the / in the absolute path and converts it as relative path.
  1209. In order to list files in a cpio archive use the following command:
  1210. $ cpio -it < archive.cpio
  1211. This command will list all the files in the given cpio archive. It reads the files from stdin .
  1212. In this command:
  1213. f -i is for specifying the input
  1214. f -t is for listing
  1215. In order to extract files from the cpio archive use:
  1216. $ cpio -id < archive.cpio
  1217. Here, -d is used for extracting.
  1218. It overwrites files without prompting. If the absolute path files are present in the archive, it will
  1219. replace the files at that path. It will not extract files in the current directory like tar .
  1220. Compressing with gunzip (gzip)
  1221. gzip is a commonly used compression format in GNU/Linux platforms. Utilities such as gzip ,
  1222. gunzip , and zcat are available to handle gzip compression file types. gzip can be applied
  1223. on a file only. It cannot archive directories and multiple files. Hence we use a tar archive
  1224. and compress it with gzip . When multiple files are given as input it will produce several
  1225. individually compressed ( .gz ) files. Let's see how to operate with gzip .
  1226. How to do it...
  1227. In order to compress a file with gzip use the following command:
  1228. $ gzip filename
  1229. www.it-ebooks.info
  1230. Chapter 6
  1231. 213
  1232. $ ls
  1233. filename.gz
  1234. Then it will remove the file and produce a compressed file called filename.gz .
  1235. Extract a gzip compressed file as follows:
  1236. $ gunzip filename.gz
  1237. It will remove filename.gz and produce an uncompressed version of filename.gz .
  1238. In order to list out the properties of a compressed file use:
  1239. $ gzip -l test.txt.gz
  1240. compressed uncompressed ratio uncompressed_name
  1241. 35 6 -33.3% test.txt
  1242. The gzip command can read a file from stdin and also write a compressed file into
  1243. stdout .
  1244. Read from stdin and out as stdout as follows:
  1245. $ cat file | gzip -c > file.gz
  1246. The -c option is used to specify output to stdout .
  1247. We can specify the compression level for gzip . Use --fast or the --best option to provide
  1248. low and high compression ratios, respectively.
  1249. There's more...
  1250. The gzip command is often used with other commands. It also has advanced options to
  1251. specify the compression ratio. Let's see how to work with these features.
  1252. Gzip with tarball
  1253. We usually use gzip with tarballs. A tarball can be compressed by using the –z option passed
  1254. to the tar command while archiving and extracting.
  1255. You can create gzipped tarballs using the following methods:
  1256. f Method - 1
  1257. $ tar -czvvf archive.tar.gz [FILES]
  1258. Or:
  1259. $ tar -cavvf archive.tar.gz [FILES]
  1260. The -a option specifies that the compression format should automatically be
  1261. detected from the extension.
  1262. www.it-ebooks.info
  1263. The Backup Plan
  1264. 214
  1265. f Method - 2
  1266. First, create a tarball:
  1267. $ tar -cvvf archive.tar [FILES]
  1268. Compress it after tarballing as follows:
  1269. $ gzip archive.tar
  1270. If many files (a few hundreds) are to be archived in a tarball and need to be compressed, we
  1271. use Method - 2 with few changes. The issue with giving many files as command arguments
  1272. to tar is that it can accept only a limited number of files from the command line. In order
  1273. to solve this issue, we can create a tar file by adding files one by one using a loop with an
  1274. append option ( -r ) as follows:
  1275. FILE_LIST="file1 file2 file3 file4 file5"
  1276. for f in $FILE_LIST;
  1277. do
  1278. tar -rvf archive.tar $f
  1279. done
  1280. gzip archive.tar
  1281. In order to extract a gzipped tarball, use the following:
  1282. f -x for extraction
  1283. f -z for gzip specification
  1284. Or:
  1285. $ tar -xavvf archive.tar.gz -C extract_directory
  1286. In the above command, the -a option is used to detect the compression format automatically.
  1287. zcat – reading gzipped files without extracting
  1288. zcat is a command that can be used to dump an extracted file from a .gz file to stdout
  1289. without manually extracting it. The .gz file remains as before but it will dump the extracted
  1290. file into stdout as follows:
  1291. $ ls
  1292. test.gz
  1293. $ zcat test.gz
  1294. A test file
  1295. # file test contains a line "A test file"
  1296. $ ls
  1297. test.gz
  1298. www.it-ebooks.info
  1299. Chapter 6
  1300. 215
  1301. Compression ratio
  1302. We can specify compression ratio, which is available in range 1 to 9, where:
  1303. f 1 is the lowest, but fastest
  1304. f 9 is the best, but slowest
  1305. You can also specify the ratios in between as follows:
  1306. $ gzip -9 test.img
  1307. This will compress the file to the maximum.
  1308. See also
  1309. f Archiving with tar, explains the tar command
  1310. Compressing with bunzip (bzip)
  1311. bunzip2 is another compression technique which is very similar to gzip . bzip2 typically
  1312. produces smaller (more compressed) files than gzip . It comes with all Linux distributions.
  1313. Let's see how to use bzip2 .
  1314. How to do it...
  1315. In order to compress with bzip2 use:
  1316. $ bzip2 filename
  1317. $ ls
  1318. filename.bz2
  1319. Then it will remove the file and produce a compressed file called filename.bzip2 .
  1320. Extract a bzipped file as follows:
  1321. $ bunzip2 filename.bz2
  1322. It will remove filename.bz2 and produce an uncompressed version of filename .
  1323. bzip2 can read a file from stdin and also write a compressed file into stdout .
  1324. In order to read from stdin and read out as stdout use:
  1325. $ cat file | bzip2 -c > file.tar.bz2
  1326. -c is used to specify output to stdout .
  1327. www.it-ebooks.info
  1328. The Backup Plan
  1329. 216
  1330. We usually use bzip2 with tarballs. A tarball can be compressed by using the -j option
  1331. passed to the tar command while archiving and extracting.
  1332. Creating a bzipped tarball can be done by using the following methods:
  1333. f Method - 1
  1334. $ tar -cjvvf archive.tar.bz2 [FILES]
  1335. Or:
  1336. $ tar -cavvf archive.tar.bz2 [FILES]
  1337. The -a option specifies to automatically detect compression format from the extension.
  1338. f Method - 2
  1339. First create the tarball:
  1340. $ tar -cvvf archive.tar [FILES]
  1341. Compress it after tarballing:
  1342. $ bzip2 archive.tar
  1343. If we need to add hundreds of files to the archive, the above commands may fail. To fix that
  1344. issue, use a loop to append files to the archive one by one using the –r option. See the similar
  1345. section from the recipe, Compressing with gunzip (gzip).
  1346. Extract a bzipped tarball as follows:
  1347. $ tar -xjvvf archive.tar.bz2 -C extract_directory
  1348. In this command:
  1349. f -x is used for extraction
  1350. f -j is for bzip2 specification
  1351. f -C is for specifying the directory to which the files are to be extracted
  1352. Or, you can use the following command:
  1353. $ tar -xavvf archive.tar.bz2 -C extract_directory
  1354. -a will automatically detect the compression format.
  1355. There's more...
  1356. bunzip has several additional options to carry out different functions. Let's go through few
  1357. of them.
  1358. Keeping input files without removing them
  1359. While using bzip2 or bunzip2 , it will remove the input file and produce a compressed output
  1360. file. But we can prevent it from removing input files by using the –k option.
  1361. www.it-ebooks.info
  1362. Chapter 6
  1363. 217
  1364. For example:
  1365. $ bunzip2 test.bz2 -k
  1366. $ ls
  1367. test test.bz2
  1368. Compression ratio
  1369. We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the
  1370. least compression, but fast, and 9 is the highest possible compression but much slower).
  1371. For example:
  1372. $ bzip2 -9 test.img
  1373. This command provides maximum compression.
  1374. See also
  1375. f Archiving with tar, explains the tar command
  1376. Compressing with lzma
  1377. lzma is comparatively new when compared to gzip or bzip2 . lzma offers better
  1378. compression rates than gzip or bzip2 . As lzma is not preinstalled on most Linux distros,
  1379. you may need to install it using the package manager.
  1380. How to do it...
  1381. In order to compress with lzma use the following command:
  1382. $ lzma filename
  1383. $ ls
  1384. filename.lzma
  1385. This will remove the file and produce a compressed file called filename.lzma .
  1386. To extract an lzma file use:
  1387. $ unlzma filename.lzma
  1388. This will remove filename.lzma and produce an uncompressed version of the file.
  1389. The lzma command can also read a file from stdin and write the compressed file to stdout .
  1390. www.it-ebooks.info
  1391. The Backup Plan
  1392. 218
  1393. In order to read from stdin and read out as stdout use:
  1394. $ cat file | lzma -c > file.lzma
  1395. -c is used to specify output to stdout .
  1396. We usually use lzma with tarballs. A tarball can be compressed by using the --lzma option
  1397. passed to the tar command while archiving and extracting.
  1398. There are two methods to create a lzma tarball:
  1399. f Method - 1
  1400. $ tar -cvvf --lzma archive.tar.lzma [FILES]
  1401. Or:
  1402. $ tar -cavvf archive.tar.lzma [FILES]
  1403. The -a option specifies to automatically detect the compression format from the
  1404. extension.
  1405. f Method - 2
  1406. First, create the tarball:
  1407. $ tar -cvvf archive.tar [FILES]
  1408. Compress it after tarballing:
  1409. $ lzma archive.tar
  1410. If we need to add hundreds of files to the archive, the above commands may fail. To fix that
  1411. issue, use a loop to append files to the archive one by one using the –r option. See the
  1412. similar section from the recipe, Compressing with gunzip (gzip).
  1413. There's more...
  1414. Let's go through additional options associated with lzma utilities
  1415. Extracting an lzma tarball
  1416. In order to extract a tarball compressed with lzma compression to a specified directory, use:
  1417. $ tar -xvvf --lzma archive.tar.lzma -C extract_directory
  1418. In this command, -x is used for extraction. --lzma specifies the use of lzma to
  1419. decompress the resulting file.
  1420. Or, we could also use:
  1421. $ tar -xavvf archive.tar.lzma -C extract_directory
  1422. The -a option specifies to automatically detect the compression format from the extension.
  1423. www.it-ebooks.info
  1424. Chapter 6
  1425. 219
  1426. Keeping input files without removing them
  1427. While using lzma or unlzma , it will remove the input file and produce an output file. But we
  1428. can prevent from removing input files and keep them by using the -k option. For example:
  1429. $ lzma test.bz2 -k
  1430. $ ls
  1431. test.bz2.lzma
  1432. Compression ratio
  1433. We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the
  1434. least compression, but fast, and 9 is the highest possible compression but much slower).
  1435. You can also specify ratios in between as follows:
  1436. $ lzma -9 test.img
  1437. This command compresses the file to the maximum.
  1438. See also
  1439. f Archiving with tar, explains the tar command
  1440. Archiving and compressing with zip
  1441. ZIP is a popular compression format used on many platforms. It isn't as commonly used as
  1442. gzip or bzip2 on Linux platforms, but files from the Internet are often saved in this format.
  1443. How to do it...
  1444. In order to archive with ZIP, the following syntax is used:
  1445. $ zip archive_name.zip [SOURCE FILES/DIRS]
  1446. For example:
  1447. $ zip file.zip file
  1448. Here, the file.zip file will be produced.
  1449. Archive directories and files recursively as follows:
  1450. $ zip -r archive.zip folder1 file2
  1451. In this command, -r is used for specifying recursive.
  1452. www.it-ebooks.info
  1453. The Backup Plan
  1454. 220
  1455. Unlike lzma , gzip , or bzip2 , zip won't remove the source file after archiving. zip is similar
  1456. to tar in that respect, but zip can compress files where tar does not. However, zip adds
  1457. compression too.
  1458. In order to extract files and folders in a ZIP file, use:
  1459. $ unzip file.zip
  1460. It will extract the files without removing filename.zip (unlike unlzma or gunzip ).
  1461. In order to update files in the archive with newer files in the filesystem, use the -u flag:
  1462. $ zip file.zip -u newfile
  1463. Delete a file from a zipped archive, by using –d as follows:
  1464. $ zip -d arc.zip file.txt
  1465. In order to list the files in an archive use:
  1466. $ unzip -l archive.zip
  1467. squashfs – the heavy compression filesystem
  1468. squashfs is a heavy-compression based read-only filesystem that is capable of compressing
  1469. 2 to 3GB of data onto a 700 MB file. Have you ever thought of how Linux Live CDs work?
  1470. When a Live CD is booted it loads a complete Linux environment. Linux Live CDs make use
  1471. of a read-only compressed filesystem called squashfs. It keeps the root filesystem on a
  1472. compressed filesystem file. It can be loopback mounted and files can be accessed. Thus when
  1473. some files are required by processes, they are decompressed and loaded onto the RAM and
  1474. used. Knowledge of squashfs can be useful when building a custom live OS or when required
  1475. to keep files heavily compressed and to access them without entirely extracting the files.
  1476. For extracting a large compressed file, it will take a long time. However, if a file is loopback
  1477. mounted, it will be very fast since the required portion of the compressed files are only
  1478. decompressed when the request for files appear. In regular decompression, all the data is
  1479. decompressed first. Let's see how we can use squashfs.
  1480. Getting ready
  1481. If you have an Ubuntu CD just locate a .squashfs file at CDRom ROOT/casper/
  1482. filesystem.squashfs . squashfs internally uses compression algorithms such as gzip
  1483. and lzma . squashfs support is available in all of the latest Linux distros. However, in order
  1484. to create squashfs files, an additional package squashfs-tools needs to be installed from
  1485. package manager.
  1486. www.it-ebooks.info
  1487. Chapter 6
  1488. 221
  1489. How to do it...
  1490. In order to create a squashfs file by adding source directories and files, use:
  1491. $ mksquashfs SOURCES compressedfs.squashfs
  1492. Sources can be wildcards, or file, or folder paths.
  1493. For example:
  1494. $ sudo mksquashfs /etc test.squashfs
  1495. Parallel mksquashfs: Using 2 processors
  1496. Creating 4.0 filesystem on test.squashfs, block size 131072.
  1497. [=======================================] 1867/1867 100%
  1498. More details will be printed on terminal. They are limited to save space
  1499. In order to mount the squashfs file to a mount point, use loopback mounting as follows:
  1500. # mkdir /mnt/squash
  1501. # mount -o loop compressedfs.squashfs /mnt/squash
  1502. You can copy contents by accessing /mnt/squashfs .
  1503. There's more...
  1504. The squashfs file system can be created by specifying additional parameters. Let's go
  1505. through the additional options.
  1506. Excluding files while creating a squashfs file
  1507. While creating a squashfs file, we can exclude a list of files or a file pattern specified using
  1508. wildcards.
  1509. Exclude a list of files specified as command-line arguments by using the -e option. For
  1510. example:
  1511. $ sudo mksquashfs /etc test.squashfs -e /etc/passwd /etc/shadow
  1512. The –e option is used to exclude passwd and shadow files.
  1513. It is also possible to specify a list of exclude files given in a file with –ef as follows:
  1514. $ cat excludelist
  1515. /etc/passwd
  1516. /etc/shadow
  1517. $ sudo mksquashfs /etc test.squashfs -ef excludelist
  1518. If we want to support wildcards in excludes lists, use -wildcard as an argument.
  1519. www.it-ebooks.info
  1520. The Backup Plan
  1521. 222
  1522. Cryptographic tools and hashes
  1523. Encryption techniques are used mainly to protect data from unauthorized access. There are
  1524. many algorithms available and we use a common set of standard algorithms. There are a few
  1525. tools available in a Linux environment for performing encryption and decryption. Sometimes
  1526. we use encryption algorithm hashes for verifying data integrity. This section will introduce a few
  1527. commonly-used cryptographic tools and a general set of algorithms that these tools can handle.
  1528. How to do it...
  1529. Let's see how to use the tools such as crypt, gpg, base64, md5sum, sha1sum, and openssl:
  1530. f crypt
  1531. The crypt command is a simple cryptographic utility, which takes a file from stdin
  1532. and a passphrase as input and outputs encrypted data into stdout .
  1533. $ crypt <input_file> output_file
  1534. Enter passphrase:
  1535. It will interactively ask for a passphrase. We can also provide a passphrase through
  1536. command-line arguments.
  1537. $ crypt PASSPHRASE < input_file > encrypted_file
  1538. In order to decrypt the file use:
  1539. $ crypt PASSPHRASE -d < encrypted_file > output_file
  1540. f gpg (GNU privacy guard)
  1541. gpg (GNU privacy guard) is a widely-used encryption scheme used for protecting files
  1542. with key signing techniques that enables to access data by authentic destination only.
  1543. gpg signatures are very famous. The details of gpg are outside the scope of this book.
  1544. Here we can learn how to encrypt and decrypt a file.
  1545. In order to encrypt a file with gpg use:
  1546. $ gpg -c filename
  1547. This command reads the passphrase interactively and generates filename.gpg .
  1548. In order to decrypt a gpg file use:
  1549. $ gpg filename.gpg
  1550. This command reads a passphrase and decrypts the file.
  1551. f Base64
  1552. Base64 is a group of similar encoding schemes that represents binary data in an
  1553. ASCII string format by translating it into a radix-64 representation. The base64
  1554. command can be used to encode and decode the Base64 string.
  1555. www.it-ebooks.info
  1556. Chapter 6
  1557. 223
  1558. In order to encode a binary file into Base64 format, use:
  1559. $ base64 filename > outputfile
  1560. Or:
  1561. $ cat file | base64 > outputfile
  1562. It can read from stdin .
  1563. Decode Base64 data as follows:
  1564. $ base64 -d file > outputfile
  1565. Or:
  1566. $ cat base64_file | base64 -d > outputfile
  1567. f md5sum and sha1sum
  1568. md5sum and sha1sum are unidirectional hash algorithms, which cannot be reversed
  1569. to form the original data. These are usually used to verify the integrity of data or for
  1570. generating a unique key from a given data. For every file it generates a unique key by
  1571. analyzing its content.
  1572. $ md5sum file
  1573. 8503063d5488c3080d4800ff50850dc9 file
  1574. $ sha1sum file
  1575. 1ba02b66e2e557fede8f61b7df282cd0a27b816b file
  1576. These types of hashes are ideal for storing passwords. Passwords are stored as its
  1577. hashes. When a user wants to authenticate, the password is read and converted to
  1578. the hash. Then hash is compared to the one that is stored already. If they are same,
  1579. the password is authenticated and access is provided, else it is denied. Storing
  1580. original password strings is risky and poses a security risk of exposing the password.
  1581. f Shadowlike hash (salted hash)
  1582. Let's see how to generate shadow like salted hash for passwords.
  1583. The user passwords in Linux are stored as its hashes in the /etc/shadow file. A
  1584. typical line in /etc/shadow will look like this:
  1585. test:$6$fG4eWdUi$ohTKOlEUzNk77.4S8MrYe07NTRV4M3LrJnZP9p.qc1bR5c.
  1586. EcOruzPXfEu1uloBFUa18ENRH7F70zhodas3cR.:14790:0:99999:7:::
  1587. In this line $6$fG4eWdUi$ohTKOlEUzNk77.4S8MrYe07NTRV4M3LrJnZP9p.
  1588. qc1bR5c.EcOruzPXfEu1uloBFUa18ENRH7F70zhodas3cR is the shadow hash
  1589. corresponding to its password.
  1590. In some situations, we may need to write critical administration scripts that may need
  1591. to edit passwords or add users manually using a shell script. In that case we have to
  1592. generate a shadow password string and write a similar line as above to the shadow
  1593. file. Let's see how to generate a shadow password using openssl .
  1594. www.it-ebooks.info
  1595. The Backup Plan
  1596. 224
  1597. Shadow passwords are usually salted passwords. SALT is an extra string used to
  1598. obfuscate and make the encryption stronger. The salt consists of random bits that are
  1599. used as one of the inputs to a key derivation function that generates the salted hash
  1600. for the password.
  1601. For more details on salt, see the Wikipedia page http://en.wikipedia.org/
  1602. wiki/Salt_(cryptography) .
  1603. $ openssl passwd -1 -salt SALT_STRING PASSWORD
  1604. $1$SALT_STRING$323VkWkSLHuhbt1zkSsUG.
  1605. Replace SALT_STRING with a random string and PASSWORD with the password you
  1606. want to use.
  1607. Backup snapshots with rsync
  1608. Backing up data is something that most sysadmins need to do regularly. We may need to
  1609. backup data in a web server or from remote locations. rsync is a command that can be
  1610. used to synchronize files and directories from one location to another while minimizing data
  1611. transfer using file difference calculations and compression. The advantage of rsync over the
  1612. cp command is that rsync uses strong difference algorithms. Also, it supports data transfer
  1613. across networks. While making copies, it compares the files in the original and destination
  1614. locations and will only copy the files that are newer. It also supports compression, encryption,
  1615. and a lot more. Let's see how we can work with rsync .
  1616. How to do it...
  1617. In order to copy a source directory to a destination (to create a mirror) use:
  1618. $ rsync -av source_path destination_path
  1619. In this command:
  1620. f -a stands for archiving
  1621. f -v (verbose) prints the details or progress on stdout
  1622. The above command will recursively copy all the files from the source path to the destination
  1623. path. We can specify paths as remote or localhost paths.
  1624. It can be in the format /home/slynux/data , slynux@192.168.0.6:/home/backups/
  1625. data , and so on.
  1626. /home/slynux/data specifies the absolute path in the machine in which the rsync
  1627. command is executed. slynux@192.168.0.6:/home/backups/data specifies that the
  1628. path is /home/backups/data in the machine with IP address 192.168.0.6 and is logged
  1629. in as user slynux .
  1630. www.it-ebooks.info
  1631. Chapter 6
  1632. 225
  1633. In order to back up data to a remote server or host, use:
  1634. $ rsync -av source_dir username@host:PATH
  1635. To keep a mirror at the destination, run the same rsync command scheduled at regular
  1636. intervals. It will copy only changed files to the destination.
  1637. Restore the data from remote host to localhost as follows:
  1638. $ rsync -av username@host:PATH destination
  1639. The rsync command uses SSH to connect to another remote machine. Provide the remote
  1640. machine address in the format user@host , where user is the username and host is the IP
  1641. address or domain name attached to the remote machine. PATH is the absolute path address
  1642. where the data needs to be copied. rsync will ask for the user password as usual for SSH
  1643. logic. This can be automated (avoid user password probing) by using SSH keys.
  1644. Make sure that the OpenSSH is installed and running on the remote machine.
  1645. Compressing data while transferring through the network can significantly optimize the
  1646. speed of the transfer. We can use the rsync option –z to specify to compress data while
  1647. transferring through a network. For example:
  1648. $ rsync -avz source destination
  1649. For the PATH format, if we use / at the end of the source, rsync will copy
  1650. contents of that end directory specified in the source_path to the destination.
  1651. If / not at the end of the source, rsync will copy that end directory itself to the
  1652. destination.
  1653. For example, the following command copies the content of the test directory:
  1654. $ rsync -av /home/test/ /home/backups
  1655. The following command copies the test directory to the destination:
  1656. $ rsync -av /home/test /home/backups
  1657. If / is at the end of destination_path, rsync will copy the source to the
  1658. destination directory.
  1659. If / is not used at the end of the destination path, rsync will create a folder,
  1660. named similar to the source directory, at the end of the destination path and
  1661. copy the source into that directory.
  1662. For example:
  1663. $ rsync -av /home/test /home/backups/
  1664. www.it-ebooks.info
  1665. The Backup Plan
  1666. 226
  1667. This command copies the source ( /home/test ) to an existing folder called backups .
  1668. $ rsync -av /home/test /home/backups
  1669. This command copies the source ( /home/test ) to a directory named backups by creating
  1670. that directory.
  1671. There's more...
  1672. The rsync command has several additional functionalities that can be specified using its
  1673. command-line options. Let's go through them.
  1674. Excluding files while archiving with rsync
  1675. Some files need not be updated while archiving to a remote location. It is possible to tell rsync
  1676. to exclude certain files from the current operation. Files can be excluded by two options:
  1677. --exclude PATTERN
  1678. We can specify a wildcard pattern of files to be excluded. For example:
  1679. $ rsync -avz /home/code/some_code /mnt/disk/backup/code --exclude "*.txt"
  1680. This command excludes .txt files from backing up.
  1681. Or, we can specify a list of files to be excluded by providing a list file.
  1682. Use --exclude-from FILEPATH .
  1683. Deleting non-existent files while updating rsync backup
  1684. We archive files as tarball and transfer the tarball to the remote backup location. When we
  1685. need to update the backup data, we create a TAR file again and transfer the file to the backup
  1686. location. By default, rsync does not remove files from the destination if they no longer exist
  1687. at the source. In order to remove the files from the destination that do not exist at the source,
  1688. use the rsync --delete option:
  1689. $ rsync -avz SOURCE DESTINATION --delete
  1690. Scheduling backups at intervals
  1691. You can create a cron job to schedule backups at regular intervals.
  1692. A sample is as follows:
  1693. $ crontab -e
  1694. Add the following line:
  1695. 0 */10 * * * rsync -avz /home/code user@IP_ADDRESS:/home/backups
  1696. The above crontab entry schedules the rsync to be executed every 10 hours.
  1697. www.it-ebooks.info
  1698. Chapter 6
  1699. 227
  1700. */10 is the hour position of the crontab syntax. /10 specifies to execute the backup every
  1701. 10 hours. If */10 is written in the minutes position, it will execute every 10 minutes.
  1702. Have a look at the Scheduling with cron recipe in Chapter 9 to understand how to configure
  1703. crontab .
  1704. Version control based backup with Git
  1705. People use different strategies in backing up data. Differential backups are more efficient
  1706. than making copies of the entire source directory to a target the backup directory with the
  1707. version number using date or time of a day. It causes wastage of space. We only need to
  1708. copy the changes that occurred to files from the second time that the backups occur. This is
  1709. called incremental backups. We can manually create incremental backups using tools like
  1710. rsync . But restoring this sort of backup can be difficult. The best way to maintain and restore
  1711. changes is to use version control systems. They are very much used in software development
  1712. and maintenance of code, since coding frequently undergoes changes. Git (GNU it) is a very
  1713. famous and is the most efficient version control systems available. Let's use Git for backup
  1714. of regular files in non-programming context. Git can be installed by your distro's package
  1715. manager. It was written by Linus Torvalds.
  1716. Getting ready
  1717. Here is the problem statement:
  1718. We have a directory that contains several files and subdirectories. We need to keep track of
  1719. changes occurring to the directory contents and back them up. If data becomes corrupted or
  1720. goes missing, we must be able to restore a previous copy of that data. We need to backup the
  1721. data at regular intervals to a remote machine. We also need to take the backup at different
  1722. locations in the same machine (localhost). Let's see how to implement it using Git.
  1723. How to do it...
  1724. In the directory which is to be backed up use:
  1725. $ cd /home/data/source
  1726. Let it be the directory source to be tracked.
  1727. Set up and initiate the remote backup directory. In the remote machine, create the backup
  1728. destination directory:
  1729. $ mkdir -p /home/backups/backup.git
  1730. $ cd /home/backups/backup.git
  1731. $ git init --bare
  1732. www.it-ebooks.info
  1733. The Backup Plan
  1734. 228
  1735. The following steps are to be performed in the source host machine:
  1736. 1. Add user details to Git in the source host machine:
  1737. $ git config --global user.name "Sarath Lakshman"
  1738. #Set user name to "Sarath Lakshman"
  1739. $ git config --global user.email slynux@slynux.com
  1740. # Set email to slynux@slynux.com
  1741. Initiate the source directory to backup from the host machine. In the source directory in
  1742. the host machine whose files are to be backed up, execute the following commands:
  1743. $ git init
  1744. Initialized empty Git repository in /home/backups/backup.git/
  1745. # Initialize git repository
  1746. $ git commit --allow-empty -am "Init"
  1747. [master (root-commit) b595488] Init
  1748. 2. In the source directory, execute the following command to add the remote git
  1749. directory and synchronize backup:
  1750. $ git remote add origin user@remotehost:/home/backups/backup.git
  1751. $ git push origin master
  1752. Counting objects: 2, done.
  1753. Writing objects: 100% (2/2), 153 bytes, done.
  1754. Total 2 (delta 0), reused 0 (delta 0)
  1755. To user@remotehost:/home/backups/backup.git
  1756. * [new branch] master -> master
  1757. 3. Add or remove files for Git tracking.
  1758. The following command adds all files and folders in the current directory to the
  1759. backup list:
  1760. $ git add *
  1761. We can conditionally add certain files only to the backup list as follows:
  1762. $ git add *.txt
  1763. $ git add *.py
  1764. We can remove the files and folders not required to be tracked by using:
  1765. $ git rm file
  1766. It can be a folder or even a wildcard as follows:
  1767. $ git rm *.txt
  1768. www.it-ebooks.info
  1769. Chapter 6
  1770. 229
  1771. 4. Check-pointing or marking backup points.
  1772. We can mark checkpoints for the backup with a message using the following
  1773. command:
  1774. $ git commit -m "Commit Message"
  1775. We need to update the backup at the remote location at regular intervals. Hence, set
  1776. up a cron job (for example, backing up every five hours).
  1777. Create a file crontab entry with lines:
  1778. 0 */5 * * * /home/data/backup.sh
  1779. Create a script /home/data/backup.sh as follows:
  1780. #!/bin/ bash
  1781. cd /home/data/source
  1782. git add .
  1783. git commit -am "Commit - @ $(date)"
  1784. git push
  1785. Now we have set up the backup system.
  1786. 5. Restoring data with Git.
  1787. In order to view all backup versions use:
  1788. $ git log
  1789. Update the current directory to the last backup by ignoring any recent changes.
  1790. ‰ To revert back to any previous state or version, look into the commit ID,
  1791. which is a 32-character hex string. Use the commit ID with git checkout .
  1792. ‰ For commit ID 3131f9661ec1739f72c213ec5769bc0abefa85a9 it will be:
  1793. $ git checkout 3131f9661ec1739f72c213ec5769bc0abefa85a9
  1794. $ git commit -am "Restore @ $(date) commit ID:
  1795. 3131f9661ec1739f72c213ec5769bc0abefa85a9"
  1796. $ git push
  1797. ‰ In order to view the details about versions again, use:
  1798. $ git log
  1799. If the working directory is broken due to some issues, we need to fix the directory with
  1800. the backup at the remote location.
  1801. Then we can recreate the contents from the backup at the remote location as follows:
  1802. $ git clone user@remotehost:/home/backups/backup.git
  1803. This will create a directory backup with all contents.
  1804. www.it-ebooks.info
  1805. The Backup Plan
  1806. 230
  1807. Cloning hard drive and disks with dd
  1808. While working with hard drives and partitions, we may need to create copies or make backups
  1809. of full partitions rather than copying all contents (not only hard disk partitions but also copy an
  1810. entire hard disk without missing any information, such as boot record, partition table, and so
  1811. on). In this situation we can use the dd command. It can be used to clone any type of disks,
  1812. such as hard disks, flash drives, CDs, DVDs, floppy disks, and so on.
  1813. Getting ready
  1814. The dd command expands to Data Definition. Since its improper usage leads to loss of data,
  1815. it is nicknamed as "Data Destroyer". Be careful while using the order of arguments. Wrong
  1816. arguments can lead to loss of entire data or can become useless. dd is basically a bitstream
  1817. duplicator that writes the entire bit stream from a disk to a file or a file to a disk. Let's see how
  1818. to use dd .
  1819. How to do it...
  1820. The syntax for dd is as follows:
  1821. $ dd if=SOURCE of=TARGET bs=BLOCK_SIZE count=COUNT
  1822. In this command:
  1823. f if stands for input file or input device path
  1824. f of stands for target file or target device path
  1825. f bs stands for block size (usually, it is given in the power of 2, for example, 512, 1024,
  1826. 2048, and so on). COUNT is the number of blocks to be copied (an integer).
  1827. Total bytes copied = BLOCK_SIZE * COUNT
  1828. bs and count are optional.
  1829. By specifying COUNT we can limit the number of bytes to be copied from input file to target. If
  1830. COUNT is not specified, dd will copy from input file until it reaches the end of file (EOF) marker.
  1831. In order to copy a partition into a file use:
  1832. # dd if=/dev/sda1 of=sda1_partition.img
  1833. Here /dev/sda1 is the device path for the partition.
  1834. Restore the partition using the backup as follows:
  1835. # dd if=sda1_partition.img of=/dev/sda1
  1836. You should be careful about the argument if and of . Improper usage may lead to data loss.
  1837. www.it-ebooks.info
  1838. Chapter 6
  1839. 231
  1840. By changing the device path /dev/sda1 to the appropriate device path, any disk can be
  1841. copied or restored.
  1842. In order to permanently delete all of the data in a partition, we can make dd to write zeros into
  1843. the partition by using the following command:
  1844. # dd if=/dev/zero of=/dev/sda1
  1845. /dev/zero is a character device. It always returns infinite zero '\0' characters.
  1846. Clone one hard disk to another hard disk of the same size as follows:
  1847. # dd if=/dev/sda of=/dev/sdb
  1848. Here /dev/sdb is the second hard disk.
  1849. In order to take the image of a CD ROM (ISO file) use:
  1850. # dd if=/dev/cdrom of=cdrom.iso
  1851. There's more...
  1852. When a file system is created in a file which is generated using dd , we can mount it to a
  1853. mount point. Let's see how to work with it.
  1854. Mounting image files
  1855. Any file image created using dd can be mounted using the loopback method. Use the -o
  1856. loop with the mount command.
  1857. # mkdir /mnt/mount_point
  1858. # mount -o loop file.img /mnt/mount_point
  1859. Now we can access the contents of the image files through the location /mnt/mount_point .
  1860. See also
  1861. f Creating ISO files, Hybrid ISO of Chapter 3, explains how to use dd to create an ISO
  1862. file from a CD
  1863. www.it-ebooks.info
  1864. www.it-ebooks.info
  1865. 7
  1866. The Old-boy Network
  1867. In this chapter, we will cover:
  1868. f Basic networking primer
  1869. f Let's ping!
  1870. f Listing all the machines alive on a network
  1871. f Transferring files through network
  1872. f Setting up an Ethernet and wireless LAN with script
  1873. f Password-less auto-login with SSH
  1874. f Running commands on remote host with SSH
  1875. f Mounting remote drive at local mount point
  1876. f Multi-casting window messages on a network
  1877. f Network traffic and port analysis
  1878. Introduction
  1879. Networking is the act of interconnecting machines through a network and configuring the
  1880. nodes in the network with different specifications. We use TCP/IP as our networking stack
  1881. and all operations are based on it. Networks are an important part of every computer system.
  1882. Each node connected in the network is assigned a unique IP address for identification. There
  1883. are many parameters in networking, such as subnet mask, route, ports, DNS, and so on,
  1884. which require a basic understanding to follow.
  1885. www.it-ebooks.info
  1886. The Old-boy Network
  1887. 234
  1888. Several applications that make use of a network operate by opening and connecting to
  1889. firewall ports. Every application may offer services such as data transfer, remote shell login,
  1890. and so on. Several interesting management tasks can be performed on a network consisting
  1891. of many machines. Shell scripts can be used to configure the nodes in a network, test the
  1892. availability of machines, automate execution of commands at remote hosts, and so on. This
  1893. chapter focuses on different recipes that introduce interesting tools or commands related to
  1894. networking and also how they can be used for solving different problems.
  1895. Basic networking primer
  1896. Before digging through recipes based on networking, it is essential for you to have a basic
  1897. understanding of setting up a network, the terminology and commands for assigning an IP
  1898. address, adding routes, and so on. This recipe will give an overview of different commands
  1899. used in GNU/Linux for networking and their usages from the basics.
  1900. Getting ready
  1901. Every node in a network requires many parameters to be assigned to work successfully and
  1902. interconnect with other machines. Some of the different parameters are the IP address,
  1903. subnet mask, gateway, route, DNS, and so on.
  1904. This recipe will introduce commands ifconfig , route , nslookup , and host .
  1905. How to do it...
  1906. Network interfaces are used to connect to a network. Usually, in the context of UNIX-like
  1907. Operating Systems, network interfaces follow the eth0, eth1 naming convention. Also, other
  1908. interfaces, such as usb0, wlan0, and so on, are available for USB network interfaces, wireless
  1909. LAN, and other such networks.
  1910. ifconfig is the command that is used to display details about network interfaces, subnet
  1911. mask, and so on.
  1912. ifconfig is available at /sbin/ifconfig . Some GNU/Linux distributions will display an
  1913. error "command not found" when ifconfig is typed. This is because /sbin in not included
  1914. in the user's PATH environment variable. When a command is typed, the Bash looks in the
  1915. directories specified in PATH variable.
  1916. By default, in Debian, ifconfig is not available since /sbin is not in PATH.
  1917. /sbin/ifconfig is the absolute path, so try run ifconfig with the absolute path (that is,
  1918. /sbin/ifconfig ). For every system, there will be a by default interface 'lo' called loopback
  1919. that points to the current machine. For example:
  1920. $ ifconfig
  1921. lo Link encap:Local Loopback
  1922. www.it-ebooks.info
  1923. Chapter 7
  1924. 235
  1925. inet addr:127.0.0.1 Mask:255.0.0.0
  1926. inet6addr: ::1/128 Scope:Host
  1927. UP LOOPBACK RUNNING MTU:16436 Metric:1
  1928. RX packets:6078 errors:0 dropped:0 overruns:0 frame:0
  1929. TX packets:6078 errors:0 dropped:0 overruns:0 carrier:0
  1930. collisions:0 txqueuelen:0
  1931. RX bytes:634520 (634.5 KB) TX bytes:634520 (634.5 KB)
  1932. wlan0 Link encap:EthernetHWaddr 00:1c:bf:87:25:d2
  1933. inet addr:192.168.0.82 Bcast:192.168.3.255 Mask:255.255.252.0
  1934. inet6addr: fe80::21c:bfff:fe87:25d2/64 Scope:Link
  1935. UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
  1936. RX packets:420917 errors:0 dropped:0 overruns:0 frame:0
  1937. TX packets:86820 errors:0 dropped:0 overruns:0 carrier:0
  1938. collisions:0 txqueuelen:1000
  1939. RX bytes:98027420 (98.0 MB) TX bytes:22602672 (22.6 MB)
  1940. The left-most column in the ifconfig output lists the name of network interfaces and the
  1941. right-hand columns show the details related to the corresponding network interface.
  1942. There's more...
  1943. There are several additional commands that frequently come under usage for querying and
  1944. configuring the network. Let's go through the essential commands and usage.
  1945. Printing the list of network interfaces
  1946. Here is a one-liner command sequence to print the list of network interface available
  1947. on a system.
  1948. $ ifconfig | cut -c-10 | tr -d ' ' | tr -s '\n'
  1949. lo
  1950. wlan0
  1951. The first 10 characters of each line in the ifconfig output is reserved for writing the
  1952. name of the network interface. Hence we use cut to extract the first 10 characters of each
  1953. line. tr -d ' ' deletes every space character in each line. Now the \n newline character is
  1954. squeezed using tr -s '\n' to produce a list of interface names.
  1955. www.it-ebooks.info
  1956. The Old-boy Network
  1957. 236
  1958. Assigning and displaying IP addresses
  1959. The ifconfig command displays details of every network interface available on the system.
  1960. However, we can restrict it to a specific interface by using:
  1961. $ ifconfig iface_name
  1962. For example:
  1963. $ ifconfig wlan0
  1964. wlan0 Link encap:Ethernet HWaddr 00:1c:bf:87:25:d2
  1965. inet addr:192.168.0.82 Bcast:192.168.3.255
  1966. Mask:255.255.252.0
  1967. From the outputs of the previously mentioned command, our interests lie in the IP address,
  1968. broadcast address, hardware address, and subnet mask. They are as follows:
  1969. f HWaddr 00:1c:bf:87:25:d2 is the hardware address (MAC address)
  1970. f inet addr:192.168.0.82 is the IP address
  1971. f Bcast:192.168.3.255 is the broadcast address
  1972. f Mask:255.255.252.0 is the subnet mask
  1973. In several scripting contexts, we may need to extract any of these addresses from the script
  1974. for further manipulations.
  1975. Extracting the IP address is a common task. In order to extract the IP address from the
  1976. ifconfig output use:
  1977. $ ifconfig wlan0 | egrep -o "inet addr:[^ ]*" | grep -o "[0-9.]*"
  1978. 192.168.0.82
  1979. Here the first command egrep -o "inet addr:[^ ]*" will print inet
  1980. addr:192.168.0.82 .
  1981. The pattern starts with inet addr: and ends with some non-space character sequence
  1982. (specified by [^ ]* ). Now in the next pipe, it prints the character combination of digits and '.'.
  1983. In order to set the IP address for a network interface, use:
  1984. # ifconfig wlan0 192.168.0.80
  1985. You will need to run the above command as root. 192.168.0.80 is the address to be set.
  1986. Set the subnet mask along with IP address as follows:
  1987. # ifconfig wlan0 192.168.0.80 netmask 255.255.252.0
  1988. www.it-ebooks.info
  1989. Chapter 7
  1990. 237
  1991. Spoofing Hardware Address (MAC Address)
  1992. In certain circumstances where authentication or filtering of computers on a network is
  1993. provided by using the hardware address, we can use hardware address spoofing. The
  1994. hardware address appears in ifconfig output as HWaddr 00:1c:bf:87:25:d2 .
  1995. We can spoof the hardware address at the software level as follows:
  1996. # ifconfig eth0 hw ether 00:1c:bf:87:25:d5
  1997. In the above command, 00:1c:bf:87:25:d5 is the new MAC address to be assigned.
  1998. This can be useful when we need to access the Internet through MAC authenticated service
  1999. providers that provide access to the Internet for a single machine.
  2000. Name server and DNS (Domain Name Service)
  2001. The elementary addressing scheme for the Internet is IP addresses (dotted decimal form, for
  2002. example, 202.11.32.75 ). However, the resources on the Internet (for example, websites)
  2003. are accessed through a combination of ASCII characters called URLs or domain names. For
  2004. example, google.com is a domain name. It actually corresponds to an IP address. Typing the
  2005. IP address in the browser can also access the URL www.google.com .
  2006. This technique of abstracting IP addresses with symbolic names is called Domain Name Service
  2007. (DNS). When we enter google.com , the DNS servers configured with our network resolve the
  2008. domain name into the corresponding IP address. While on a local network, we setup the local
  2009. DNS for naming local machines on the network symbolically using their hostnames.
  2010. Name servers assigned to the current system can be viewed by reading /etc/resolv.conf .
  2011. For example:
  2012. $ cat /etc/resolv.conf
  2013. nameserver 8.8.8.8
  2014. We can add name servers manually as follows:
  2015. # echo nameserver IP_ADDRESS >> /etc/resolv.conf
  2016. How can we obtain the IP address for a corresponding domain name?
  2017. The easiest method to obtain an IP address is by trying to ping the given domain name and
  2018. looking at the echo reply. For example:
  2019. $ ping google.com
  2020. PING google.com (64.233.181.106) 56(84) bytes of data.
  2021. Here 64.233.181.106 is the corresponding IP address.
  2022. A domain name can have multiple IP addresses assigned. In that case, the DNS server will
  2023. return one address among the list of IP addresses. To obtain all the addresses assigned to
  2024. the domain name, we should use a DNS lookup utility.
  2025. www.it-ebooks.info
  2026. The Old-boy Network
  2027. 238
  2028. DNS lookup
  2029. There are different DNS lookup utilities available from the command line. These will request a
  2030. DNS server for an IP address resolution. host and nslookup are two DNS lookup utilities.
  2031. When host is executed it will list out all of the IP addressed attached to the domain name.
  2032. nslookup is another command that is similar to host , which can be used to query details
  2033. related to DNS and resolving of names. For example:
  2034. $ host google.com
  2035. google.com has address 64.233.181.105
  2036. google.com has address 64.233.181.99
  2037. google.com has address 64.233.181.147
  2038. google.com has address 64.233.181.106
  2039. google.com has address 64.233.181.103
  2040. google.com has address 64.233.181.104
  2041. It may also list out DNS resource records like MX (Mail Exchanger) as follows:
  2042. $ nslookup google.com
  2043. Server: 8.8.8.8
  2044. Address: 8.8.8.8#53
  2045. Non-authoritative answer:
  2046. Name: google.com
  2047. Address: 64.233.181.105
  2048. Name: google.com
  2049. Address: 64.233.181.99
  2050. Name: google.com
  2051. Address: 64.233.181.147
  2052. Name: google.com
  2053. Address: 64.233.181.106
  2054. Name: google.com
  2055. Address: 64.233.181.103
  2056. Name: google.com
  2057. Address: 64.233.181.104
  2058. Server: 8.8.8.8
  2059. The last line above corresponds to the default nameserver used for DNS resolution.
  2060. www.it-ebooks.info
  2061. Chapter 7
  2062. 239
  2063. Without using the DNS server, it is possible to add a symbolic name to IP address resolution
  2064. just by adding entries into file /etc/hosts .
  2065. In order to add an entry, use the following syntax:
  2066. # echo IP_ADDRESS symbolic_name >> /etc/hosts
  2067. For example:
  2068. # echo 192.168.0.9 backupserver.com >> /etc/hosts
  2069. After adding this entry, whenever a resolution to backupserver.com occurs, it will resolve
  2070. to 192.168.0.9 .
  2071. Setting default gateway, showing routing table information
  2072. When a local network is connected to another network, it needs to assign some machine
  2073. or network node through which an interconnection takes place. Hence the IP packets with
  2074. a destination exterior to the local network should be forwarded to the node machine, which
  2075. is interconnected to the external network. This special node machine, which is capable of
  2076. forwarding packets to the external network, is called a gateway. We set the gateway for every
  2077. node to make it possible to connect to an external network.
  2078. The operating system maintains a table called the routing table, which contains information
  2079. on how packets are to be forwarded and through which machine node in the network. The
  2080. routing table can be displayed as follows:
  2081. $ route
  2082. Kernel IP routing table
  2083. Destination Gateway Genmask Flags Metric Ref UseIface
  2084. 192.168.0.0 * 255.255.252.0 U 2 0 0wlan0
  2085. link-local * 255.255.0.0 U 1000 0 0wlan0
  2086. default p4.local 0.0.0.0 UG 0 0 0wlan0
  2087. Or, you can also use:
  2088. $ route -n
  2089. Kernel IP routing table
  2090. Destination Gateway Genmask Flags Metric Ref Use Iface
  2091. 192.168.0.0 0.0.0.0 255.255.252.0 U 2 0 0 wlan0
  2092. 169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 wlan0
  2093. 0.0.0.0 192.168.0.4 0.0.0.0 UG 0 0 0 wlan0
  2094. Using -n specifies to display the numerical addresses. When -n is used it will display every
  2095. entry with a numerical IP addresses, else it will show symbolic host names instead of IP
  2096. addresses under the DNS entries for IP addresses that are available.
  2097. www.it-ebooks.info
  2098. The Old-boy Network
  2099. 240
  2100. A default gateway is set as follows:
  2101. # route add default gw IP_ADDRESS INTERFACE_NAME
  2102. For example:
  2103. # route add default gw 192.168.0.1 wlan0
  2104. Traceroute
  2105. When an application requests a service through the Internet, the server may be at a distant
  2106. location and connected through any number of gateways or device nodes. The packets
  2107. travel through several gateways and reach the destination. There is an interesting command
  2108. traceroute that displays the address of all intermediate gateways through which the
  2109. packet travelled to reach the destination. traceroute information helps us to understand
  2110. how many hops each packet should take in order reach the destination. The number of
  2111. intermediate gateways or routers gives a metric to measure the distance between two nodes
  2112. connected in a large network. An example of the output from traceroute is as follows:
  2113. $ traceroute google.com
  2114. traceroute to google.com (74.125.77.104), 30 hops max, 60 byte packets
  2115. 1 gw-c6509.lxb.as5577.net (195.26.4.1) 0.313 ms 0.371 ms 0.457 ms
  2116. 2 40g.lxb-fra.as5577.net (83.243.12.2) 4.684 ms 4.754 ms 4.823 ms
  2117. 3 de-cix10.net.google.com (80.81.192.108) 5.312 ms 5.348 ms 5.327 ms
  2118. 4 209.85.255.170 (209.85.255.170) 5.816 ms 5.791 ms 209.85.255.172
  2119. (209.85.255.172) 5.678 ms
  2120. 5 209.85.250.140 (209.85.250.140) 10.126 ms 9.867 ms 10.754 ms
  2121. 6 64.233.175.246 (64.233.175.246) 12.940 ms 72.14.233.114
  2122. (72.14.233.114) 13.736 ms 13.803 ms
  2123. 7 72.14.239.199 (72.14.239.199) 14.618 ms 209.85.255.166
  2124. (209.85.255.166) 12.755 ms 209.85.255.143 (209.85.255.143) 13.803 ms
  2125. 8 209.85.255.98 (209.85.255.98) 22.625 ms 209.85.255.110
  2126. (209.85.255.110) 14.122 ms
  2127. *
  2128. 9 ew-in-f104.1e100.net (74.125.77.104) 13.061 ms 13.256 ms 13.484 ms
  2129. See also
  2130. f Playing with variables and environment variables of Chapter 1, explains the PATH
  2131. variable
  2132. f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
  2133. command
  2134. www.it-ebooks.info
  2135. Chapter 7
  2136. 241
  2137. Let's ping!
  2138. ping is the most basic network command, and one that every user should first know. It is a
  2139. universal command that is available on major Operating Systems. It is also a diagnostic tool
  2140. used for verifying the connectivity between two hosts on a network. It can be used to find out
  2141. which machines are alive on a network. Let us see how to use ping.
  2142. How to do it...
  2143. In order to check the connectivity of two hosts on a network, the ping command uses
  2144. Internet Control Message Protocol (ICMP) echo packets. When these echo packets are sent
  2145. towards a host, the host responds back with a reply if it is reachable or alive.
  2146. Check whether a host is reachable as follows:
  2147. $ ping ADDRESS
  2148. The ADDRESS can be a hostname, domain name, or an IP address itself.
  2149. ping will continuously send packets and the reply information is printed on the terminal. Stop
  2150. the pinging by pressing Ctrl + C .
  2151. For example:
  2152. f When the host is reachable the output will be similar to the following:
  2153. $ ping 192.168.0.1
  2154. PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
  2155. 64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=1.44 ms
  2156. ^C
  2157. --- 192.168.0.1 ping statistics ---
  2158. 1 packets transmitted, 1 received, 0% packet loss, time 0ms
  2159. rtt min/avg/max/mdev = 1.440/1.440/1.440/0.000 ms
  2160. $ ping google.com
  2161. PING google.com (209.85.153.104) 56(84) bytes of data.
  2162. 64 bytes from bom01s01-in-f104.1e100.net (209.85.153.104): icmp_
  2163. seq=1 ttl=53 time=123 ms
  2164. ^C
  2165. --- google.com ping statistics ---
  2166. 1 packets transmitted, 1 received, 0% packet loss, time 0ms
  2167. rtt min/avg/max/mdev = 123.388/123.388/123.388/0.000 ms
  2168. www.it-ebooks.info
  2169. The Old-boy Network
  2170. 242
  2171. f When a host is unreachable the output will be similar to:
  2172. $ ping 192.168.0.99
  2173. PING 192.168.0.99 (192.168.0.99) 56(84) bytes of data.
  2174. From 192.168.0.82 icmp_seq=1 Destination Host Unreachable
  2175. From 192.168.0.82 icmp_seq=2 Destination Host Unreachable
  2176. Once the host is not reachable, the ping returns a Destination Host Unreachable
  2177. error message.
  2178. There's more
  2179. In addition to checking the connectivity between two points in a network, the ping command
  2180. can be used with additional options to get useful information. Let's go through the additional
  2181. options of ping .
  2182. Round trip time
  2183. The ping command can be used to find out the Round Trip Time (RTT) between two hosts on a
  2184. network. RTT is the time required for the packet to reach the destination host and come back to
  2185. the source host. The RTT in milliseconds can be obtained from ping. An example is as follows:
  2186. --- google.com ping statistics ---
  2187. 5 packets transmitted, 5 received, 0% packet loss, time 4000ms
  2188. rtt min/avg/max/mdev = 118.012/206.630/347.186/77.713 ms
  2189. Here the minimum RTT is 118.012ms, the average RTT is 206.630ms, and the maximum RTT is
  2190. 347.186ms. The mdev (77.713ms) parameter in the ping output stands for mean deviation.
  2191. Limiting number of packets to be sent
  2192. The ping command sends echo packets and waits for the reply of echo indefinitely until it is
  2193. stopped by pressing Ctrl + C . However, we can limit the count of echo packets to be sent by
  2194. using the -c flag.
  2195. The usage is as follows:
  2196. -c COUNT
  2197. For example:
  2198. $ ping 192.168.0.1 -c 2
  2199. PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
  2200. 64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=4.02 ms
  2201. 64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=1.03 ms
  2202. www.it-ebooks.info
  2203. Chapter 7
  2204. 243
  2205. --- 192.168.0.1 ping statistics ---
  2206. 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
  2207. rtt min/avg/max/mdev = 1.039/2.533/4.028/1.495 ms
  2208. In the previous example, the ping command sends two echo packets and stops.
  2209. This is useful when we need to ping multiple machines from a list of IP addresses through a
  2210. script and checks its statuses.
  2211. Return status of ping command
  2212. The ping command returns exit status 0 when it succeeds and returns non-zero when it
  2213. fails. Successful means, destination host is reachable, where failure is when destination host
  2214. is unreachable.
  2215. The return status can be easily obtained as follows:
  2216. $ ping ADDRESS -c2
  2217. if [ $? -eq 0 ];
  2218. then
  2219. echo Successful ;
  2220. else
  2221. echo Failure
  2222. fi
  2223. Listing all the machines alive on a network
  2224. When we deal with a large local area network, we may need to check the availability of other
  2225. machines in the network, whether alive or not. A machine may not be alive in two conditions:
  2226. either it is not powered on or due to a problem in the network. By using shell scripting, we can
  2227. easily find out and report which machines are alive on the network. Let's see how to do it.
  2228. Getting ready
  2229. In this recipe, we use two methods. The first method uses ping and the second method uses
  2230. fping . fping doesn't come with a Linux distribution by default. You may have to manually
  2231. install fping using a package manager.
  2232. How to do it...
  2233. Let's go through the script to find out all the live machines on the network and alternate
  2234. methods to find out the same.
  2235. www.it-ebooks.info
  2236. The Old-boy Network
  2237. 244
  2238. f Method 1:
  2239. We can write our own script using the ping command to query list of IP addresses
  2240. and check whether they are alive or not as follows:
  2241. #!/bin/bash
  2242. #Filename: ping.sh
  2243. # Change base address 192.168.0 according to your network.
  2244. for ip in 192.168.0.{1..255} ;
  2245. do
  2246. ping $ip -c 2 &> /dev/null ;
  2247. if [ $? -eq 0 ];
  2248. then
  2249. echo $ip is alive
  2250. fi
  2251. done
  2252. The output is as follows:
  2253. $ ./ping.sh
  2254. 192.168.0.1 is alive
  2255. 192.168.0.90 is alive
  2256. f Method 2:
  2257. We can use an existing command-line utility to query the status of machines on a
  2258. network as follows:
  2259. $ fping -a 192.160.1/24 -g 2> /dev/null
  2260. 192.168.0.1
  2261. 192.168.0.90
  2262. Or, use:
  2263. $ fping -a 192.168.0.1 192.168.0.255 -g
  2264. How it works...
  2265. In Method 1, we used the ping command to find out the alive machines on the network.
  2266. We used a for loop for iterating through the list of IP addresses. The list is generated as
  2267. 192.168.0.{1..255} . The {start..end} notation will expand and will generate a list of
  2268. IP addresses, such as 192.168.0.1 , 192.168.0.2 , 192.168.0.3 till 192.168.0.255 .
  2269. www.it-ebooks.info
  2270. Chapter 7
  2271. 245
  2272. ping $ip -c 2 &> /dev/null will run a ping to the corresponding IP address in each
  2273. execution of loop. -c 2 is used to restrict the number of echo packets to be sent to two
  2274. packets. &> /dev/null is used to redirect both stderr and stdout to /dev/null so that
  2275. it won't be printed on the terminal. Using $? we evaluate the exit status. If it is successful, the
  2276. exit status is 0 else non-zero. Hence the successful IP addresses are printed. We can also
  2277. print the list of unsuccessful IP addresses to give the list of unreachable IP addresses.
  2278. Here is an exercise for you. Instead of using a range of IP
  2279. addresses hard-coded in the script, modify the script to
  2280. read a list of IP addresses from a file or stdin.
  2281. In this script, each ping is executed one after the other. Even though all the IP addresses
  2282. are independent each other, the ping command is executed due to a sequential program, it
  2283. takes a delay of sending two echo packets and receiving them or the timeout for a reply for
  2284. executing the next ping command.
  2285. When it comes to 255 addresses, the delay is large. Let's run all the ping commands in
  2286. parallel to make it much faster. The core part of the script is the loop body. To make the ping
  2287. commands in parallel, enclose the loop body in ( )& . ( ) encloses a block of commands
  2288. to run as the sub-shell and & sends it to the background by leaving the current thread. For
  2289. example:
  2290. (
  2291. ping $ip -c2 &> /dev/null ;
  2292. if [ $? -eq 0 ];
  2293. then
  2294. echo $ip is alive
  2295. fi
  2296. )&
  2297. wait
  2298. The for loop body executes many background process and it comes out of the loop and it
  2299. terminates the script. In order to present the script to terminate until all its child process end,
  2300. we have a command called wait . Place a wait at the end of the script so that it waits for the
  2301. time until all the child ( ) subshell processes complete.
  2302. The wait command enables a script to be terminated only after all its child
  2303. process or background processes terminate or complete.
  2304. Have a look at fast_ping.sh from the code provided with the book.
  2305. www.it-ebooks.info
  2306. The Old-boy Network
  2307. 246
  2308. Method 2 uses a different command called fping . It can ping a list of IP addresses
  2309. simultaneously and respond very quickly. The options available with fping are as follows:
  2310. f The -a option with fping specifies to print all alive machine's IP addresses
  2311. f The -u option with fping specifies to print all unreachable machines
  2312. f The -g option specifies to generate a range of IP addresses from slash-subnet mask
  2313. notation specified as IP/mask or start and end IP addresses as:
  2314. $ fping -a 192.160.1/24 -g
  2315. Or
  2316. $ fping -a 192.160.1 192.168.0.255 -g
  2317. f 2>/dev/null is used to dump error messages printed due to unreachable host to a
  2318. null device
  2319. It is also possible to manually specify a list of IP addresses as command-line arguments or as
  2320. a list through stdin . For example:
  2321. $ fping -a 192.168.0.1 192.168.0.5 192.168.0.6
  2322. # Passes IP address as arguments
  2323. $ fping -a <ip.list
  2324. # Passes a list of IP addresses from a file
  2325. There's more...
  2326. The fping command can be used for querying DNS data from a network. Let's see how to do it.
  2327. DNS lookup with fping
  2328. fping has an option -d that returns host names by using DNS lookup for each echo reply. It
  2329. will print out host names rather than IP addresses on ping replies.
  2330. $ cat ip.list
  2331. 192.168.0.86
  2332. 192.168.0.9
  2333. 192.168.0.6
  2334. $ fping -a -d 2>/dev/null <ip.list
  2335. www.local
  2336. dnss.local
  2337. www.it-ebooks.info
  2338. Chapter 7
  2339. 247
  2340. See also
  2341. f Playing with file descriptors and redirection of Chapter 1, explains the data
  2342. redirection
  2343. f Comparisons and tests of Chapter 1, explains numeric comparisons
  2344. Transferring files
  2345. The major purpose of the networking of computers is for resource sharing. Among resource
  2346. sharing, the most prominent use is in file sharing. There are different methods by which we
  2347. can transfer files between different nodes on a network. This recipe discusses how to transfer
  2348. files using commonly used protocols FTP, SFTP, RSYNC, and SCP.
  2349. Getting ready
  2350. The commands for performing file transfer over the network are mostly available by default
  2351. with Linux installations. Files via FTP can be transferred by using the lftp command. Files via
  2352. a SSH connection can be transferred by using sftp , RSYNC using SSH with rsync command
  2353. and transfer through SSH using scp .
  2354. How to do it...
  2355. File Transfer Protocol (FTP) is an old file transfer protocol for transferring files between
  2356. machines on a network. We can use the command lftp for accessing FTP enabled servers
  2357. for file transfer. It uses Port 21. FTP can only be used if an FTP server is installed on the
  2358. remote machine. FTP is used by many public websites to share files.
  2359. To connect to an FTP server and transfer files in between, use:
  2360. $ lftp username@ftphost
  2361. Now it will prompt for a password and then display a logged in prompt as follows:
  2362. lftp username@ftphost:~>
  2363. You can type commands in this prompt. For example:
  2364. f To change to a directory, use cd directory
  2365. f To change directory of local machine, use lcd
  2366. f To create a directory use mkdir
  2367. f To download a file, use get filename as follows:
  2368. lftp username@ftphost:~> get filename
  2369. www.it-ebooks.info
  2370. The Old-boy Network
  2371. 248
  2372. f To upload a file from the current directory, use put filename as follows:
  2373. lftp username@ftphost:~> put filename
  2374. f An lftp session can be exited by using the quit command
  2375. Auto completion is supported in the lftp prompt.
  2376. There's more...
  2377. Let's go through some additional techniques and commands used for file transfer through a
  2378. network.
  2379. Automated FTP transfer
  2380. ftp is another command used for FTP-based file transfer. lftp is more flexible for usage.
  2381. lftp and the ftp command open an interactive session with user (it prompts for user input
  2382. by displaying messages). What if we want to automate a file transfer instead of using the
  2383. interactive mode? We can automate FTP file transfers by writing a shell script as follows:
  2384. #!/bin/bash
  2385. #Filename: ftp.sh
  2386. #Automated FTP transfer
  2387. HOST='domain.com'
  2388. USER='foo'
  2389. PASSWD='password'
  2390. ftp -i -n $HOST <<EOF
  2391. user ${USER} ${PASSWD}
  2392. binary
  2393. cd /home/slynux
  2394. puttestfile.jpg
  2395. getserverfile.jpg
  2396. quit
  2397. EOF
  2398. The above script has the following structure:
  2399. <<EOF
  2400. DATA
  2401. EOF
  2402. This is used to send data through stdin to the FTP command. The recipe, Playing with file
  2403. descriptors and redirection in Chapter 1, explains various methods for redirection into stdin .
  2404. The -i option of ftp turns off the interactive session with user. user ${USER} ${PASSWD}
  2405. sets the username and password. binary sets the file mode to binary.
  2406. www.it-ebooks.info
  2407. Chapter 7
  2408. 249
  2409. SFTP (Secure FTP)
  2410. SFTP is an FTP-like file transfer system that runs on top of an SSH connection. It makes use of
  2411. an SSH connection to emulate an FTP interface. It doesn't require an FTP server at the remote
  2412. end to perform file transfer but it requires an OpenSSH server to be installed and running. It is
  2413. an interactive command, which offers an sftp prompt.
  2414. The following commands are used to perform the file transfer. All other commands remain
  2415. same for every automated FTP session with specific HOST, USER, and PASSWD:
  2416. cd /home/slynux
  2417. put testfile.jpg
  2418. get serverfile.jpg
  2419. In order to run sftp , use:
  2420. $ sftp user@domainname
  2421. Similar to lftp , an sftp session can be exited by typing the quit command.
  2422. The SSH server sometimes will not be running at the default Port 22. If it is running at a
  2423. different port, we can specify the port along with sftp as -oPort=PORTNO .
  2424. For example:
  2425. $ sftp -oPort=422 user@slynux.org
  2426. -oPort should be the first argument of the sftp command.
  2427. RSYNC
  2428. rsync is an important command-line utility that is widely used for copying files over networks
  2429. and for taking backup snapshots. This is better explained in separate recipe,
  2430. Backup snapshots with rsync, that explains the usage of rsync .
  2431. SCP (Secure Copy)
  2432. SCP is a file copy technique which is more secure than the traditional remote copy tool called
  2433. rcp . The files are transferred through an encrypted channel. SSH is used as an encryption
  2434. channel. We can easily transfer files to a remote machine as follows:
  2435. $ scp filename user@remotehost:/home/path
  2436. This will prompt for a password. It can be made password less by using autologin SSH
  2437. technique. The recipe, Password-less auto-login with SSH, explains SSH autologin.
  2438. Therefore, file transfer using scp doesn't require specific scripting. Once SSH login is automated,
  2439. the scp command can be executed without an interactive prompt for the password.
  2440. www.it-ebooks.info
  2441. The Old-boy Network
  2442. 250
  2443. Here remotehost can be IP address or domain name. The format of the scp command is:
  2444. $ scp SOURCE DESTINATION
  2445. SOURCE or DESTINATION can be in the format username@localhost:/path for example:
  2446. $ scp user@remotehost:/home/path/filename filename
  2447. The above command copies a file from the remote host to the current directory with the given
  2448. filename.
  2449. If SSH is running at a different port than 22, use -oPort with the same syntax as sftp .
  2450. Recursive copying with SCP
  2451. By using scp we can recursively copy a directory between two machines on a network as
  2452. follows with the -r parameter:
  2453. $ scp -r /home/slynux user@remotehost:/home/backups
  2454. # Copies the directory /home/slynux recursively to remote location
  2455. scp can also copy files by preserving permissions and mode by using the -p parameter.
  2456. See also
  2457. f Playing with file descriptors and redirection of Chapter 1, explains the standard input
  2458. using EOF
  2459. Setting up an Ethernet and wireless LAN
  2460. with script
  2461. An Ethernet is simple to configure. Since it uses physical cables, there are no special
  2462. requirements such as authentication. However, a wireless LAN requires authentication—for
  2463. example, a WEP key as well as the ESSID of the wireless network to connect. Let's see how to
  2464. connect to a wireless as well as a wired network by writing a shell script.
  2465. Getting ready
  2466. To connect to a wired network, we need to assign an IP address and subnet mask by using the
  2467. ifconfig utility. But for a wireless network connection, it will require additional utilities, such
  2468. as iwconfig and iwlist , to configure more parameters.
  2469. www.it-ebooks.info
  2470. Chapter 7
  2471. 251
  2472. How to do it...
  2473. In order to connect to a network from a wired interface, execute the following script:
  2474. #!/bin/bash
  2475. #Filename: etherconnect.sh
  2476. #Description: Connect Ethernet
  2477. #Modify the parameters below according to your settings
  2478. ######### PARAMETERS ###########
  2479. IFACE=eth0
  2480. IP_ADDR=192.168.0.5
  2481. SUBNET_MASK=255.255.255.0
  2482. GW=192.168.0.1
  2483. HW_ADDR='00:1c:bf:87:25:d2'
  2484. # HW_ADDR is optional
  2485. #################################
  2486. if [ $UID -ne 0 ];
  2487. then
  2488. echo "Run as root"
  2489. exit 1
  2490. fi
  2491. # Turn the interface down before setting new config
  2492. /sbin/ifconfig $IFACE down
  2493. if [[ -n $HW_ADDR ]];
  2494. then
  2495. /sbin/ifconfig hw ether $HW_ADDR
  2496. echo Spoofed MAC ADDRESS to $HW_ADDR
  2497. fi
  2498. /sbin/ifconfig $IFACE $IP_ADDR netmask $SUBNET_MASK
  2499. route add default gw $GW $IFACE
  2500. echo Successfully configured $IFACE
  2501. The script for connecting to a wireless LAN with WEP is as follows:
  2502. #!/bin/bash
  2503. #Filename: wlan_connect.sh
  2504. #Description: Connect to Wireless LAN
  2505. #Modify the parameters below according to your settings
  2506. ######### PARAMETERS ###########
  2507. IFACE=wlan0
  2508. IP_ADDR=192.168.1.5
  2509. SUBNET_MASK=255.255.255.0
  2510. www.it-ebooks.info
  2511. The Old-boy Network
  2512. 252
  2513. GW=192.168.1.1
  2514. HW_ADDR='00:1c:bf:87:25:d2'
  2515. #Comment above line if you don't want to spoof mac address
  2516. ESSID="homenet"
  2517. WEP_KEY=8b140b20e7
  2518. FREQ=2.462G
  2519. #################################
  2520. KEY_PART=""
  2521. if [[ -n $WEP_KEY ]];
  2522. then
  2523. KEY_PART="key $WEP_KEY"
  2524. fi
  2525. # Turn the interface down before setting new config
  2526. /sbin/ifconfig $IFACE down
  2527. if [ $UID -ne 0 ];
  2528. then
  2529. echo "Run as root"
  2530. exit 1;
  2531. fi
  2532. if [[ -n $HW_ADDR ]];
  2533. then
  2534. /sbin/ifconfig $IFACE hw ether $HW_ADDR
  2535. echo Spoofed MAC ADDRESS to $HW_ADDR
  2536. fi
  2537. /sbin/iwconfig $IFACE essid $ESSID $KEY_PART freq $FREQ
  2538. /sbin/ifconfig $IFACE $IP_ADDR netmask $SUBNET_MASK
  2539. route add default gw $GW $IFACE
  2540. echo Successfully configured $IFACE
  2541. How it works...
  2542. The commands ifconfig , iwconfig , and route are to be run as root. Hence a check for
  2543. the root user is performed at the beginning of the scripts.
  2544. The Ethernet connection script is pretty straightforward and it uses the concepts explained in
  2545. the recipe, Basic networking primer. Let's go through the commands used for connecting to
  2546. the wireless LAN.
  2547. www.it-ebooks.info
  2548. Chapter 7
  2549. 253
  2550. A wireless LAN requires some parameters such as the essid , key , and frequency to connect
  2551. to the network. The essid is the name of the wireless network to which we need to connect.
  2552. Some Wired Equivalent Protocol (WEP) networks use a WEP key for authentication, whereas
  2553. some networks don't. The WEP key is usually a 10-letter hex passphrase. Next comes the
  2554. frequency assigned to the network. iwconfig is the command used to attach the wireless
  2555. card with the proper wireless network, WEP key, and frequency.
  2556. We can scan and list the available wireless network by using the utility iwlist . To scan, use
  2557. the following command:
  2558. # iwlist scan
  2559. wlan0 Scan completed :
  2560. Cell 01 - Address: 00:12:17:7B:1C:65
  2561. Channel:11
  2562. Frequency:2.462 GHz (Channel 11)
  2563. Quality=33/70 Signal level=-77 dBm
  2564. Encryption key:on
  2565. ESSID:"model-2"
  2566. The Frequency parameter can be extracted from the scan result, from the line
  2567. Frequency:2.462 GHz (Channel 11) .
  2568. See also
  2569. f Comparisons and tests of Chapter 1, explains string comparisons.
  2570. Password-less auto-login with SSH
  2571. SSH is widely used with automation scripting. By using SSH, it is possible to remotely execute
  2572. commands at remote hosts and read their output. SSH is authenticated by using username
  2573. and password. Passwords are prompted during the execution of SSH commands. But in
  2574. automation scripts, SSH commands may be executed hundreds of times in a loop and hence
  2575. providing passwords each time is impractical. Hence we need to automate logins. SSH has
  2576. a built-in feature by which SSH can auto-login using SSH keys. This recipe describes how to
  2577. create SSH keys and facilitate auto-login.
  2578. www.it-ebooks.info
  2579. The Old-boy Network
  2580. 254
  2581. How to do it...
  2582. The SSH uses public key-based and private key-based encryption techniques for automatic
  2583. authentication. An authentication key has two elements: a public key and a private key pair.
  2584. We can create an authentication key using the ssh-keygen command. For automating the
  2585. authentication, the public key must be placed at the server (by appending the public key to the
  2586. ~/.ssh/authorized_keys file) and its private key file of the pair should be present at the
  2587. ~/.ssh directory of the user at client machine, which is the computer you are logging in from.
  2588. Several configurations (for example, path and name of the authorized_keys file) regarding
  2589. the SSH can be configured by altering the configuration file /etc/ssh/sshd_config .
  2590. There are two steps towards the setup of automatic authentication with SSH. They are:
  2591. 1. Creating the SSH key from the machine, which requires a login to a remote machine.
  2592. 2. Transferring the public key generated to the remote host and appending it to
  2593. ~/.ssh/authorized_keys file.
  2594. In order to create an SSH key, enter the ssh-keygen command with the encryption algorithm
  2595. type specified as RSA as follows:
  2596. $ ssh-keygen -t rsa
  2597. Generating public/private rsa key pair.
  2598. Enter file in which to save the key (/home/slynux/.ssh/id_rsa):
  2599. Created directory '/home/slynux/.ssh'.
  2600. Enter passphrase (empty for no passphrase):
  2601. Enter same passphrase again:
  2602. Your identification has been saved in /home/slynux/.ssh/id_rsa.
  2603. Your public key has been saved in /home/slynux/.ssh/id_rsa.pub.
  2604. The key fingerprint is:
  2605. f7:17:c6:4d:c9:ee:17:00:af:0f:b3:27:a6:9c:0a:05slynux@slynux-laptop
  2606. The key's randomart image is:
  2607. +--[ RSA 2048]----+
  2608. | . |
  2609. | o . .|
  2610. | E o o.|
  2611. | ...oo |
  2612. | .S .+ +o.|
  2613. | . . .=....|
  2614. | .+.o...|
  2615. | . . + o. .|
  2616. | ..+ |
  2617. +-----------------+
  2618. www.it-ebooks.info
  2619. Chapter 7
  2620. 255
  2621. You need to enter a passphrase for generating the public-private key pair. It is also possible
  2622. to generate the key pair without entering a passphrase, but it is insecure. We can write
  2623. monitoring scripts that use automated login from the script to several machines. In such
  2624. cases, you should leave the passphrase empty while running the ssh-keygen command to
  2625. prevent the script from asking for a passphrase while running.
  2626. Now ~/.ssh/id_rsa.pub and ~/.ssh/id_rsa has been generated. id_dsa.pub is the
  2627. generated public key and id_dsa is the private key. The public key has to be appended to the
  2628. ~/.ssh/authorized_keys file on remote servers where we need to auto-login from the
  2629. current host.
  2630. In order to append a key file, use:
  2631. $ ssh USER@REMOTE_HOST "cat >> ~/.ssh/authorized_keys" < ~/.ssh/id_rsa.
  2632. pub
  2633. Password:
  2634. Provide the login password in the previous command.
  2635. The auto-login has been set up. From now on, SSH will not prompt for passwords during
  2636. execution. You can test this with the following command:
  2637. $ ssh USER@REMOTE_HOST uname
  2638. Linux
  2639. You will not be prompted for a password.
  2640. Running commands on remote host
  2641. with SSH
  2642. SSH is an interesting system administration tool that enables to control remote hosts by login
  2643. with a shell. SSH stands for Secure Shell. Commands can be executed on the shell received
  2644. by login to remote host as if we run commands on localhost. It runs the network data transfer
  2645. over an encrypted tunnel. This recipe will introduce different ways in which commands can be
  2646. executed on the remote host.
  2647. Getting ready
  2648. SSH doesn't come by default with all GNU/Linux distributions. Therefore, you may have to
  2649. install the openssh-server and openssh-client packages using a package manager.
  2650. SSH service runs by default on port number 22.
  2651. www.it-ebooks.info
  2652. The Old-boy Network
  2653. 256
  2654. How to do it...
  2655. To connect to a remote host with the SSH server running, use:
  2656. $ ssh username@remote_host
  2657. In this command:
  2658. f username is the user that exist at the remote host.
  2659. f remote_host can be domain name or IP address.
  2660. For example:
  2661. $ ssh mec@192.168.0.1
  2662. The authenticity of host '192.168.0.1 (192.168.0.1)' can't be
  2663. established.
  2664. RSA key fingerprint is 2b:b4:90:79:49:0a:f1:b3:8a:db:9f:73:2d:75:d6:f9.
  2665. Are you sure you want to continue connecting (yes/no)? yes
  2666. Warning: Permanently added '192.168.0.1' (RSA) to the list of known
  2667. hosts.
  2668. Password:
  2669. Last login: Fri Sep 3 05:15:21 2010 from 192.168.0.82
  2670. mec@proxy-1:~$
  2671. It will interactively ask for a user password and upon successful authentication it will return
  2672. the shell for the user.
  2673. By default, the SSH server runs at Port 22. But certain servers run the SSH service at different
  2674. ports. In that case use -p port_no with the ssh command to specify the port.
  2675. In order to connect to an SSH server running at port 422, use:
  2676. $ ssh user@locahost -p 422
  2677. You can execute commands in the shell that corresponds to the remote host. Shell is an
  2678. interactive tool in which a user types and runs commands. However, in shell scripting contexts,
  2679. we do not need an interactive shell. We need to automate several tasks. We require to execute
  2680. several commands at the remote shell and display or store its output at localhost. Issuing a
  2681. password every time is not practical for an automated script, hence autologin for SSH should
  2682. be configured.
  2683. The recipe, Password-less auto-login with SSH, explains the SSH commands.
  2684. Make sure that auto-login is configured before running automated scripts that use SSH.
  2685. www.it-ebooks.info
  2686. Chapter 7
  2687. 257
  2688. To run a command on the remote host and display its output on the localhost shell, use the
  2689. following syntax:
  2690. $ ssh user@host 'COMMANDS'
  2691. For example:
  2692. $ ssh mec@192.168.0.1 'whoami'
  2693. Password:
  2694. mec
  2695. Multiple commands can be given by using semicolon delimiter in between the commands as:
  2696. $ ssh user@host 'command1 ; command2 ; command3'
  2697. Commands can be sent through stdin and the output of the commands will be available to
  2698. stdout .
  2699. The syntax will be as follows:
  2700. $ ssh user@remote_host "COMMANDS" > stdout.txt 2> errors.txt
  2701. The COMMANDS string should be quoted in order to prevent a semicolon character to act as
  2702. delimiter in the localhost shell. We can also pass any command sequence that involves piped
  2703. statements to the SSH command through stdin as follows:
  2704. $ echo "COMMANDS" | sshuser@remote_host> stdout.txt 2> errors.txt
  2705. For example:
  2706. $ ssh mec@192.168.0.1 "echo user: $(whoami);echo OS: $(uname)"
  2707. Password:
  2708. user: slynux
  2709. OS: Linux
  2710. In this example, the commands executed on the remote host are:
  2711. echo user: $(whoami);
  2712. echo OS: $(uname)
  2713. It can be generalized as:
  2714. COMMANDS="command1; command2; command3"
  2715. $ ssh user@hostname "$COMMANDS"
  2716. We can also pass a more complex subshell in the command sequence by using the ( )
  2717. subshell operator.
  2718. www.it-ebooks.info
  2719. The Old-boy Network
  2720. 258
  2721. Let's write an SSH based shell script that collects the uptime of a list of remote hosts. Uptime
  2722. is the time for which the system is powered on. The uptime command is used to display how
  2723. long the system has been powered on.
  2724. It is assumed that all systems in the IP_LIST have a common user test .
  2725. #!/bin/bash
  2726. #Filename: uptime.sh
  2727. #Description: Uptime monitor
  2728. IP_LIST="192.168.0.1 192.168.0.5 192.168.0.9"
  2729. USER="test"
  2730. for IP in $IP_LIST;
  2731. do
  2732. utime=$(ssh $USER@$IP uptime | awk '{ print $3 }' )
  2733. echo $IP uptime: $utime
  2734. done
  2735. The expected output is:
  2736. $ ./uptime.sh
  2737. 192.168.0.1 uptime: 1:50,
  2738. 192.168.0.5 uptime: 2:15,
  2739. 192.168.0.9 uptime: 10:15,
  2740. There's more...
  2741. The ssh command can be executed with several additional options. Let's go through them.
  2742. SSH with compression
  2743. The SSH protocol also supports data transfer with compression, which comes in handy when
  2744. bandwidth is an issue. Use the -C option with the ssh command to enable compression as
  2745. follows:
  2746. $ ssh -C user@hostname COMMANDS
  2747. Redirecting data into stdin of remote host shell commands
  2748. Sometimes we need to redirect some data into stdin of remote shell commands. Let's see
  2749. how to do it. An example is as follows:
  2750. $ echo "text" | ssh user@remote_host 'cat >> list'
  2751. www.it-ebooks.info
  2752. Chapter 7
  2753. 259
  2754. Or:
  2755. # Redirect data from file as:
  2756. $ ssh user@remote_host 'cat >> list' < file
  2757. cat >> list appends the data received through stdin to the file list. Here this command
  2758. is executed at the remote host. But the data is passed to stdin from localhost.
  2759. See also
  2760. f Password-less auto-login with SSH, explains how to configure auto-login to execute
  2761. commands without prompting for password.
  2762. Mounting a remote drive at a local mount
  2763. point
  2764. Having a local mount point to access remote host file-system would be really helpful while
  2765. carrying out both read and write data transfer operations. SSH is the most common transfer
  2766. protocol available in a network and hence we can make use of it with sshfs . sshfs enables
  2767. you to mount a remote filesystem to a local mount point. Let's see how to do it.
  2768. Getting ready
  2769. sshfs doesn't come by default with GNU/Linux distributions. Install sshfs by using a
  2770. package manager. sshfs is an extension to the fuse file system package that allows
  2771. supported OSes to mount a wide variety of data as if it were a local file system.
  2772. How to do it...
  2773. In order to mount a filesytem location at a remote host to a local mount point, use:
  2774. # sshfs user@remotehost:/home/path /mnt/mountpoint
  2775. Password:
  2776. Issue the user password when prompted.
  2777. Now data at /home/path on the remote host can be accessed via a local mount point /mnt/
  2778. mountpoint .
  2779. In order to unmount after completing the work, use:
  2780. # umount /mnt/mountpoint
  2781. www.it-ebooks.info
  2782. The Old-boy Network
  2783. 260
  2784. See also
  2785. f Running commands on remote host with SSH, explains the ssh command.
  2786. Multi-casting window messages on
  2787. a network
  2788. The administrator of a network may often require to send messages to the nodes on the
  2789. network. Displaying pop-up windows on the user's desktop would be helpful to alert the user
  2790. with a piece of information. Using a GUI toolkit with shell scripting can achieve this task. This
  2791. recipe discusses how to send a popup window with custom messages to remote hosts.
  2792. Getting ready
  2793. For implementing a GUI pop window, zenity can be used. Zenity is a scriptable GUI toolkit for
  2794. creating windows consisting of textbox, input box, and so on. SSH can be used for connecting
  2795. to the remote shell on a remote host. Zenity doesn't come installed by default with GNU/Linux
  2796. distributions. Use a package manager to install zenity.
  2797. How to do it...
  2798. Zenity is one of the scriptable dialog creation toolkit. There are other toolkits, such as gdialog,
  2799. kdialog, xdialog, and so on. Zenity seems to be one flexible toolkit that is adherent to the
  2800. GNOME Desktop Environment.
  2801. In order to create an info box with zenity, use:
  2802. $ zenity --info --text "This is a message"
  2803. # It will display a window with "This is a message" as text.
  2804. Zenity can be used to create windows with input box, combo input, radio button, pushbutton,
  2805. and more. They are not in the scope of this recipe. Check the man page of zenity for more.
  2806. Now, we can use SSH to run these zenity statements on a remote machine. In order to run this
  2807. statement on the remote host through SSH, run:
  2808. $ ssh user@remotehost 'zenity --info --text "This is a message"'
  2809. But this will return an error like:
  2810. (zenity:3641): Gtk-WARNING **: cannot open display:
  2811. This is because zenity depends on Xserver. Xsever is a daemon which is responsible for
  2812. plotting graphical elements on the screen which consists of the GUI. A bare GNU/Linux system
  2813. consists of only a text terminal or shell prompts.
  2814. www.it-ebooks.info
  2815. Chapter 7
  2816. 261
  2817. Xserver uses a special environment variable, DISPLAY , to track the Xserver instance that is
  2818. running on the system.
  2819. We can manually set DISPLAY=:0 to instruct Xserver about the Xserver instance.
  2820. The previous SSH command can be rewritten as:
  2821. $ ssh username@remotehost 'export DISPLAY=:0 ; zenity --info --text "This
  2822. is a message"'
  2823. This statement will display a pop up at remotehost if the user with username has been
  2824. logged in any of the window managers.
  2825. In order to multicast the popup window to multiple remote hosts, write a shell script as follows:
  2826. #!/bin/bash
  2827. #Filename: multi_cast_window.sh
  2828. # Description: Multi-cast window popups
  2829. IP_LIST="192.168.0.5 192.168.0.3 192.168.0.23"
  2830. USER="username"
  2831. COMMAND='export DISPLAY=:0 ;zenity --info --text "This is a message" '
  2832. for host in $IP_LIST;
  2833. do
  2834. ssh $USER@$host "$COMMAND" &
  2835. done
  2836. How it works...
  2837. In the above script, we have a list of IP addresses to which the window should be popped up.
  2838. A loop is used to iterate through IP addresses and execute the SSH command.
  2839. In the SSH statement, at the end we have post fixed & . & will send an SSH statement to the
  2840. background. It is done to facilitate parallelization in the execution of several SSH statements.
  2841. If & was not used, it will start the SSH session, execute the zenity dialog, and wait for the user
  2842. to close that pop up window. Unless the user at the remote host closes the window, the next
  2843. SSH statement in the loop will not be executed. In order to move away from this blocking of
  2844. the loop from further execution by waiting for the SSH session to terminate, the & trick is used.
  2845. See also
  2846. f Running commands on remote host with SSH, explains the ssh command.
  2847. www.it-ebooks.info
  2848. The Old-boy Network
  2849. 262
  2850. Network traffic and port analysis
  2851. Network ports are essential parameters of network-based applications. Applications open
  2852. ports on the host and communicate to a remote host through opened ports at the remote
  2853. host. Having awareness of opened and closed ports is essential for security context. Malwares
  2854. and root kits may be running on the system with custom ports and custom services that allow
  2855. attackers to capture unauthorized access to data and resources. By getting the list of opened
  2856. ports and services running on the ports, we can analyze and defend the system from being
  2857. controlled by root kits and the list helps to remove them efficiently. The list of opened ports
  2858. is not only helpful for malware detection, but also for collecting information about opened
  2859. ports on the system enables to debug network based applications. It helps to analyse whether
  2860. certain port connections and port listening functionalities are working fine. This recipe
  2861. discusses various utilities for port analysis.
  2862. Getting ready
  2863. Various commands are available for listening to ports and services running on each port (for
  2864. example, lsof and netstat ). These commands are, by default, available on all GNU/Linux
  2865. distributions.
  2866. How to do it...
  2867. In order to list all opened ports on the system along with the details on each service attached
  2868. to it, use:
  2869. $ lsof -i
  2870. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
  2871. firefox-b 2261 slynux 78u IPv4 63729 0t0 TCP localhost:47797-
  2872. >localhost:42486 (ESTABLISHED)
  2873. firefox-b 2261 slynux 80u IPv4 68270 0t0 TCP slynux-laptop.
  2874. local:41204->192.168.0.2:3128 (CLOSE_WAIT)
  2875. firefox-b 2261 slynux 82u IPv4 68195 0t0 TCP slynux-laptop.
  2876. local:41197->192.168.0.2:3128 (ESTABLISHED)
  2877. ssh 3570 slynux 3u IPv6 30025 0t0 TCP localhost:39263-
  2878. >localhost:ssh (ESTABLISHED)
  2879. ssh 3836 slynux 3u IPv4 43431 0t0 TCP slynux-laptop.
  2880. local:40414->boneym.mtveurope.org:422 (ESTABLISHED)
  2881. GoogleTal 4022 slynux 12u IPv4 55370 0t0 TCP localhost:42486
  2882. (LISTEN)
  2883. GoogleTal 4022 slynux 13u IPv4 55379 0t0 TCP localhost:42486-
  2884. >localhost:32955 (ESTABLISHED)
  2885. Each entry in the output of lsof corresponds to each service that opens a port for
  2886. communication. The last column of the output consists of lines similar to:
  2887. www.it-ebooks.info
  2888. Chapter 7
  2889. 263
  2890. slynux-laptop.local:34395->192.168.0.2:3128 (ESTABLISHED)
  2891. In this output slynux-laptop.local:34395 corresponds to localhost part and
  2892. 192.168.0.2:3128 corresponds to remote host.
  2893. 34395 is the port opened from current machine, and 3128 is the port to which the service
  2894. connects at remote host.
  2895. In order to list out the opened ports from current machine, use:
  2896. $ lsof -i | grep ":[0-9]\+->" -o | grep "[0-9]\+" -o | sort | uniq
  2897. The :[0-9]\+-> regex for grep is used to extract the host port portion ( :34395-> ) from the
  2898. lsof output. The next grep is used to extract the port number (which is numeric). Multiple
  2899. connections may occur through the same port and hence multiple entries of the same port may
  2900. occur. In order to display each port once, they are sorted and the unique ones are printed.
  2901. There's more...
  2902. Let's go through additional utilities that can be used for viewing the opened port and network
  2903. traffic related information.
  2904. Opened port and services using netstat
  2905. netstat is another command for network service analysis. Explaining all the features of
  2906. netstat is not in the scope of this recipe. We will now look at how to list services and port
  2907. numbers.
  2908. Use netstat -tnp to list opened ports and services as follows:
  2909. $ netstat -tnp
  2910. (Not all processes could be identified, non-owned process info
  2911. will not be shown, you would have to be root to see it all.)
  2912. Active Internet connections (w/o servers)
  2913. Proto Recv-Q Send-Q Local Address Foreign Address State
  2914. PID/Program name
  2915. tcp 0 0 192.168.0.82:38163 192.168.0.2:3128
  2916. ESTABLISHED 2261/firefox-bin
  2917. tcp 0 0 192.168.0.82:38164 192.168.0.2:3128 TIME_
  2918. WAIT -
  2919. tcp 0 0 192.168.0.82:40414 193.107.206.24:422
  2920. ESTABLISHED 3836/ssh
  2921. tcp 0 0 127.0.0.1:42486 127.0.0.1:32955
  2922. ESTABLISHED 4022/GoogleTalkPlug
  2923. tcp 0 0 192.168.0.82:38152 192.168.0.2:3128
  2924. ESTABLISHED 2261/firefox-bin
  2925. tcp6 0 0 ::1:22 ::1:39263
  2926. ESTABLISHED -
  2927. tcp6 0 0 ::1:39263 ::1:22
  2928. ESTABLISHED 3570/ssh
  2929. www.it-ebooks.info
  2930. www.it-ebooks.info
  2931. 8
  2932. Put on the Monitor's
  2933. Cap
  2934. In this chapter, we will cover:
  2935. f Disk usage hacks
  2936. f Calculating the execution time for a command
  2937. f Information about logged users, boot logs, failure boots
  2938. f Printing the 10 most frequently-used commands
  2939. f Listing the top 10 CPU consuming process in 1 hour
  2940. f Monitoring command outputs with watch
  2941. f Logging access to files and directories
  2942. f Logfile management with logrotate
  2943. f Logging with syslog
  2944. f Monitoring user logins to find intruders
  2945. f Remote disk usage health monitoring
  2946. f Finding out active user hours on a system
  2947. www.it-ebooks.info
  2948. Put on the Monitor’s Cap
  2949. 266
  2950. Introduction
  2951. An operating system consists of a collection of system software, designed for different
  2952. purposes, serving different task sets. Each of these programs requires to be monitored by the
  2953. operating system or the system administrator in order to know whether it is working properly
  2954. or not. We will also use a technique called logging by which important information is written to
  2955. a file while the application is running. By reading this file, we can understand the timeline of
  2956. the operations that are taking place with a particular software or a daemon. If an application
  2957. or a service crashes, this information helps to debug the issue and enables us to fix any
  2958. issues. Logging and monitoring also helps to gather information from a pool of data. Logging
  2959. and monitoring are important tasks for ensuring security in the operating system and for
  2960. debugging purposes.
  2961. This chapter deals with different commands that can be used to monitor different activities. It
  2962. also goes through logging techniques and their usages.
  2963. Disk usage hacks
  2964. Disk space is a limited resource. We frequently perform disk usage calculation on hard
  2965. disks or any storage media to find out the free space available on the disk. When free space
  2966. becomes scarce, we will need to find out large-sized files that are to be deleted or moved in
  2967. order to create free space. Disk usage manipulations are commonly used in shell scripting
  2968. contexts. This recipe will illustrate various commands used for disk manipulations and
  2969. problems where disk usages can be calculated with a variety of options.
  2970. Getting ready
  2971. df and du are the two significant commands that are used for calculating disk usage in Linux.
  2972. The command df stands for disk free and du stands for disk usage. Let's see how we can use
  2973. them to perform various tasks that involve disk usage calculation.
  2974. How to do it...
  2975. To find the disk space used by a file (or files), use:
  2976. $ du FILENAME1 FILENAME2 ..
  2977. For example:
  2978. $ du file.txt
  2979. 4
  2980. www.it-ebooks.info
  2981. Chapter 8
  2982. 267
  2983. The result is, by default, shown as size in bytes.
  2984. In order to obtain the disk usage for all files inside a directory along with the individual disk
  2985. usage for each file showed in each line, use:
  2986. $ du -a DIRECTORY
  2987. -a outputs results for all files in the specified directory or directories recursively.
  2988. Running du DIRECTORY will output a similar result, but it will show only the
  2989. size consumed by subdirectories. However, they do not show the disk usage
  2990. for each of the files. For printing the disk usage by files, -a is mandatory.
  2991. For example:
  2992. $ du -a test
  2993. 4 test/output.txt
  2994. 4 test/process_log.sh
  2995. 4 test/pcpu.sh
  2996. 16 test
  2997. An example of using du DIRECTORY is as follows:
  2998. $ du test
  2999. 16 test
  3000. There's more...
  3001. Let's go through additional usage practices for the du command.
Add Comment
Please, Sign In to add comment