Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- It has the format 's/substitution_pattern/replacement_string/g .
- It replaces every occurrence of substitution_pattern with the replacement string.
- Here the substitution pattern is the regex for a sentence. Every sentence is delimited by "."
- and the first character is a space. Therefore, we need to match the text that is in the format
- "space" some text MATCH_STRING some text "dot". A sentence may contain any characters
- except a "dot", which is the delimiter. Hence we have used [^.]. [^.]* matches a combination of
- any characters except dot. In between the text match string "mobile phones" is placed. Every
- match sentence is replaced by // (nothing).
- See also
- f Basic sed primer, explains the sed command
- f Basic regular expression primer, explains how to use regular expressions
- Implementing head, tail, and tac with awk
- Mastering text-processing operations comes with practice. This recipe will help us practice
- incorporating some of the commands that we have just learned with some that we already
- know.
- Getting ready
- The commands head , tail , uniq , and tac operate line by line. Whenever we need line by
- line processing, we can always use awk . Let's emulate these commands with awk .
- www.it-ebooks.info
- Texting and Driving
- 176
- How to do it...
- Let's see how different commands can be emulated with different basic text processing
- commands, such as head, tail, and tac.
- The head command reads the first ten lines of a file and prints them out:
- $ awk 'NR <=10' filename
- The tail command prints the last ten lines of a file:
- $ awk '{ buffer[NR % 10] = $0; } END { for(i=1;i<11;i++) { print
- buffer[i%10] } }' filename
- The tac command prints the lines of input file in reverse order:
- $ awk '{ buffer[NR] = $0; } END { for(i=NR; i>0; i--) { print buffer[i] }
- }' filename
- How it works...
- In the implementation of head using awk , we print the lines in the input stream having a line
- number less than or equal to 10 . The line number is available using the special variable NR .
- In the implementation of the tail command a hashing technique is used. The buffer array
- index is determined by a hashing function NR % 10 , where NR is the variable that contains the
- Linux number of current execution. $0 is the line in the text variable. Hence % maps all the lines
- having the same remainder in the hash function to a particular index of an array. In the END{}
- block, it can iterate through ten index values of an array and print the lines stored in a buffer.
- In the tac command emulation, it simply stores all the lines in an array. When it appears in
- the END{} block, NR will be holding the line number of the last line. Then it is decremented in
- a for loop until it reaches 1 and it prints the lines stored in each iteration statement.
- See also
- f Basic awk primer, explains the awk command
- f head and tail - printing the last or first 10 lines of Chapter 3, explains the commands
- head and tail
- f Sorting, unique and duplicates of Chapter 2, explains the uniq command
- f Printing lines in reverse order, explains the tac command
- www.it-ebooks.info
- Chapter 4
- 177
- Text slicing and parameter operations
- This recipe walks through some of the simple text replacement techniques and parameter
- expansion short hands available in Bash. A few simple techniques can often help us avoid
- having to write multiple lines of code.
- How to do it...
- Let's get into the tasks.
- Replacing some text from a variable can be done as follows:
- $ var="This is a line of text"
- $ echo ${var/line/REPLACED}
- This is a REPLACED of text"
- line is replaced with REPLACED .
- We can produce a sub-string by specifying the start position and string length, by using the
- following syntax:
- ${variable_name:start_position:length}
- To print from the fifth character onward use the following command:
- $ string=abcdefghijklmnopqrstuvwxyz
- $ echo ${string:4}
- efghijklmnopqrstuvwxyz
- To print eight characters starting from the fifth character, use:
- $ echo ${string:4:8}
- efghijkl
- The index is specified by counting the start letter as 0 . We can also specify counting from last
- letter as -1 . It is but used inside a parenthesis. (-1) is the index for the last letter.
- echo ${string:(-1)}
- z
- $ echo ${string:(-2):2}
- yz
- See also
- f Iterating through lines, words, and characters in a file, explains slicing of a character
- from a word
- www.it-ebooks.info
- www.it-ebooks.info
- 5
- Tangled Web?
- Not At All!
- In this chapter, we will cover:
- f Downloading from a web page
- f Downloading a web page as formatted plain text
- f A primer on cURL
- f Accessing unread Gmail mails from the command line
- f Parsing data from a website
- f Creating an image crawler and downloader
- f Creating a web photo album generator
- f Building a Twitter command-line client
- f Define utility with Web backend
- f Finding broken links in a website
- f Tracking changes to a website
- f Posting to a web page and reading response
- www.it-ebooks.info
- Tangled Web? Not At All!
- 180
- Introduction
- The Web is becoming the face of technology. It is the central access point for data processing.
- Though shell scripting cannot do everything that languages like PHP can do on the Web, there
- are still many tasks to which shell scripts are ideally suited. In this chapter we will explore
- some recipes that can be used to parse website content, download and obtain data, send
- data to forms, and automate website usage tasks and similar activities. We can automate
- many activities that we perform interactively through a browser with a few lines of scripting.
- Access to the functionalities provided by the HTTP protocol with command-line utilities
- enables us to write scripts that are suitable to solve most of the web-automation utilities.
- Have fun while going through the recipes of this chapter.
- Downloading from a web page
- Downloading a file or a web page from a given URL is simple. A few command-line download
- utilities are available to perform this task.
- Getting ready
- wget is a file download command-line utility. It is very flexible and can be configured with
- many options.
- How to do it...
- A web page or a remote file can be downloaded using wget as follows:
- $ wget URL
- For example:
- $ wget http://slynux.org
- --2010-08-01 07:51:20-- http://slynux.org/
- Resolving slynux.org... 174.37.207.60
- Connecting to slynux.org|174.37.207.60|:80... connected.
- HTTP request sent, awaiting response... 200 OK
- Length: 15280 (15K) [text/html]
- Saving to: "index.html"
- 100%[======================================>] 15,280 75.3K/s in
- 0.2s
- 2010-08-01 07:51:21 (75.3 KB/s) - "index.html" saved [15280/15280]
- www.it-ebooks.info
- Chapter 5
- 181
- It is also possible to specify multiple download URLs as follows:
- $ wget URL1 URL2 URL3 ..
- A file can be downloaded using wget using the URL as:
- $ wget ftp://example_domain.com/somefile.img
- Usually, files are downloaded with the same filename as in the URL and the download log
- information or progress is written to stdout .
- You can specify the output file name with the -O option. If the file with the specified filename
- already exists, it will be truncated first and the downloaded file will be written to the specified
- file.
- You can also specify a different logfile path rather than printing logs to stdout by using
- the -o option as follows:
- $ wget ftp://example_domain.com/somefile.img -O dloaded_file.img -o log
- By using the above command, nothing will be printed on screen. The log or progress will be
- written to log and the output file will be dloaded_file.img .
- There is a chance that downloads might break due to unstable Internet connections. Then we
- can use the number of tries as an argument so that once interrupted, the utility will retry the
- download that many times before giving up.
- In order to specify the number of tries, use the -t flag as follows:
- $ wget -t 5 URL
- There's more...
- The wget utility has several additional options that can be used under different problem
- domains. Let's go through a few of them.
- Restricted with speed downloads
- When we have a limited Internet downlink bandwidth and many applications sharing the
- internet connection, if a large file is given for download, it will suck all the bandwidth and
- may cause other process to starve for bandwidth. The wget command comes with a built-in
- option to specify the maximum bandwidth limit the download job can possess. Hence all the
- applications can simultaneously run smoothly.
- We can restrict the speed of wget by using the --limit-rate argument as follows:
- $ wget --limit-rate 20k http://example.com/file.iso
- In this command k (kilobyte) and m (megabyte) specify the speed limit.
- www.it-ebooks.info
- Tangled Web? Not At All!
- 182
- We can also specify the maximum quota for the download. It will stop when the quota is
- exceeded. It is useful when downloading multiple files limited by the total download size. This
- is useful to prevent the download from accidently using too much disk space.
- Use --quota or –Q as follows:
- $ wget -Q 100m http://example.com/file1 http://example.com/file2
- Resume downloading and continue
- If a download using wget gets interrupted before it is completed, we can resume the
- download where we left off by using the -c option as follows:
- $ wget -c URL
- Using cURL for download
- cURL is another advanced command-line utility. It is much more powerful than wget .
- cURL can be used to download as follows:
- $ curl http://slynux.org > index.html
- Unlike wget , curl writes the downloaded data into standard output ( stdout ) rather than to a
- file. Therefore, we have to redirect the data from stdout to the file using a redirection operator.
- Copying a complete website (mirroring)
- wget has an option to download the complete website by recursively collecting all the URL
- links in the web pages and downloading all of them like a crawler. Hence we can completely
- download all the pages of a website.
- In order to download the pages, use the --mirror option as follows:
- $ wget --mirror exampledomain.com
- Or use:
- $ wget -r -N -l DEPTH URL
- -l specifies the DEPTH of web pages as levels. That means it will traverse only that much
- number of levels. It is used along with –r (recursive). The -N argument is used to enable time
- stamping for the file. URL is the base URL for a website for which the download needs to be
- initiated.
- Accessing pages with HTTP or FTP authentication
- Some web pages require authentication for HTTP or FTP URLs. This can be provided by using
- the --user and --password arguments:
- $ wget –-user username –-password pass URL
- www.it-ebooks.info
- Chapter 5
- 183
- It is also possible to ask for a password without specifying the password inline. In order to do
- that use --ask-password instead of the --password argument.
- Downloading a web page as formatted
- plain text
- Web pages are HTML pages containing a collection of HTML tags along with other elements,
- such as JavaScript, CSS, and so on. But the HTML tags define the base of a web page. We
- may need to parse the data in a web page while looking for specific content, and this is
- something Bash scripting can help us with. When we download a web page, we receive an
- HTML file. In order to view formatted data, it should be viewed in a web browser. However, in
- most of the circumstances, parsing a formatted text document will be easier than parsing
- HTML data. Therefore, if we can get a text file with formatted text similar to the web page seen
- on the web browser, it is more useful and it saves a lot of effort required to strip off HTML
- tags. Lynx is an interesting command-line web browser. We can actually get the web page as
- plain text formatted output from Lynx. Let's see how to do it.
- How to do it...
- Let's download the webpage view, in ASCII character representation, in a text file using the
- –dump flag with the lynx command:
- $ lynx -dump URL > webpage_as_text.txt
- This command will also list all the hyper-links ( <a href="link"> ) separately under a
- heading References as the footer of the text output. This would help us avoid parsing of links
- separately using regular expressions.
- For example:
- $ lynx -dump http://google.com > plain_text_page.txt
- You can see the plain text version of text by using the cat command as follows:
- $ cat plain_text_page.txt
- A primer on cURL
- cURL is a powerful utility that supports many protocols including HTTP, HTTPS, FTP, and much
- more. It supports many features including POST, cookie, authentication, downloading partial
- files from a specified offset, referers, user agent strings, extra headers, limit speed, maximum
- file size, progress bars, and so on. cURL is useful for when we want to play around with
- automating a web page usage sequence and to retrieve data. This recipe is a list of the most
- important features of cURL.
- www.it-ebooks.info
- Tangled Web? Not At All!
- 184
- Getting ready
- cURL doesn't come with any of the main Linux distros by default, so you may have to install it
- using the package manager. By default, most distributions ship with wget .
- cURL usually dumps downloaded files to stdout and progress information to stderr . To
- avoid progress information from being shown, we always use the --silent option.
- How to do it…
- The curl command can be used to perform different activities such as downloading, sending
- different HTTP requests, specifying HTTP headers, and so on. Let's see how to perform
- different tasks with cURL.
- $ curl URL --silent
- The above command dumps the downloaded file into the terminal (the downloaded data is
- written to stdout ).
- The --silent option is used to prevent the curl command from displaying progress
- information. If progress information is required, remove --silent .
- $ curl URL –-silent -O
- The -O option is used to write the downloaded data into a file with the filename parsed from
- the URL rather than writing into the standard output.
- For example:
- $ curl http://slynux.org/index.html --silent -O
- index.html will be created.
- It writes a web page or file to the filename as in the URL instead of writing to stdout . If
- filenames are not there in the URL, it will produce an error. Hence, make sure that the URL
- is a URL to a remote file. curl http://slynux.org -O --silent will display an error
- since the filename cannot be parsed from the URL.
- $ curl URL –-silent -o new_filename
- The -o option is used to download a file and write to a file with a specified file name.
- In order to show the # progress bar while downloading, use –-progress instead of
- –-silent .
- $ curl http://slynux.org -o index.html --progress
- ################################## 100.0%
- www.it-ebooks.info
- Chapter 5
- 185
- There's more...
- In the previous sections we have learned how to download files and dump HTML pages to the
- terminal. There several advanced options that come along with cURL. Let's explore more
- on cURL.
- Continue/Resume downloading
- cURL has advanced resume download features to continue at a given offset unlike wget . It
- helps to download portions of files by specifying an offset.
- $ curl URL/file -C offset
- The offset is an integer value in bytes.
- cURL doesn't require us to know the exact byte offset if we want to resume downloading a file.
- If you want cURL to figure out the correct resume point, use the -C - option, like this:
- $ curl -C - URL
- cURL will automatically figure out where to restart the download of the specified file.
- Set referer string with cURL
- Referer is a string in the HTTP header used to identify the page from which the user reaches
- the current web page. When a user clicks on a link from web page A and it reaches web page
- B, the referer header string in the page B will contain a URL of page A.
- Some dynamic pages check the referer string before returning HTML data. For example, a web
- page shows a Google logo attached page when a user navigates to a website by searching on
- Google, and shows a different page when they navigate to the web page by manually typing
- the URL.
- The web page can write a condition to return a Google page if the referer is www.google.com
- or else return a different page.
- You can use --referer with the curl command to specify the referer string as follows:
- $ curl –-referer Referer_URL target_URL
- For example:
- $ curl –-referer http://google.com http://slynux.org
- Cookies with cURL
- Using curl we can specify as well as store cookies encountered during HTTP operations.
- In order to specify cookies, use the --cookie "COOKIES" option.
- www.it-ebooks.info
- Tangled Web? Not At All!
- 186
- Cookies should be provided as name=value . Multiple cookies should be delimited by a
- semicolon ";". For example:
- $ curl http://example.com –-cookie "user=slynux;pass=hack"
- In order to specify a file to which cookies encountered are to be stored, use the --cookie-
- jar option. For example:
- $ curl URL –-cookie-jar cookie_file
- Setting a user agent string with cURL
- Some web pages that check the user-agent won't work if there is no user-agent specified. You
- may have noticed that certain websites work well only in Internet Explorer (IE). If a different
- browser is used, the website will show a message that it will work only on IE. This is because
- the website checks for a user agent. You can set the user agent as IE with curl and see that
- it returns a different web page in this case.
- Using cURL it can be set using --user-agent or –A as follows:
- $ curl URL –-user-agent "Mozilla/5.0"
- Additional headers can be passed with cURL. Use –H "Header" to pass multiple additional
- headers. For example:
- $ curl -H "Host: www.slynux.org" -H "Accept-language: en" URL
- Specifying bandwidth limit on cURL
- When the available bandwidth is limited and multiple users are sharing the Internet, in order
- to perform the sharing of bandwidth smoothly, we can limit the download rate to a specified
- limit from curl by using the --limit-rate option as follows:
- $ curl URL --limit-rate 20k
- In this command k (kilobyte) and m (megabyte) specify the download rate limit.
- Specifying the maximum download size
- The maximum download file size for cURL can be specified using the --max-filesize
- option as follows:
- $ curl URL --max-filesize bytes
- It will return a non-zero exit code if the file size exceeds. It will return zero if it succeeds.
- Authenticating with cURL
- HTTP authentication or FTP authentication can be done using cURL with the -u argument.
- www.it-ebooks.info
- Chapter 5
- 187
- The username and password can be specified using -u username:password . It is possible
- to not provide a password such that it will prompt for password while executing.
- If you prefer to be prompted for the password, you can do that by using only -u username .
- For example:
- $ curl -u user:pass http://test_auth.com
- In order to be prompted for the password use:
- $ curl -u user http://test_auth.com
- Printing response headers excluding data
- It is useful to print only response headers to apply many checks or statistics. For example, to
- check whether a page is reachable or not, we don't need to download the entire page contents.
- Just reading the HTTP response header can be used to identify if a page is available or not.
- An example usage case for checking the HTTP header is to check the file size before
- downloading. We can check the Content-Length parameter in the HTTP header to find out
- the length of a file before downloading. Also, several useful parameters can be retrieved from
- the header. The Last-Modified parameter enables to know the last modification time for
- the remote file.
- Use the –I or –head option with curl to dump only HTTP headers without downloading the
- remote file. For example:
- $ curl -I http://slynux.org
- HTTP/1.1 200 OK
- Date: Sun, 01 Aug 2010 05:08:09 GMT
- Server: Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_log_bytes/1.2
- mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_
- ssl/2.8.31 OpenSSL/0.9.7a
- Last-Modified: Thu, 19 Jul 2007 09:00:58 GMT
- ETag: "17787f3-3bb0-469f284a"
- Accept-Ranges: bytes
- Content-Length: 15280
- Connection: close
- Content-Type: text/html
- See also
- f Posting to a web page and reading response
- www.it-ebooks.info
- Tangled Web? Not At All!
- 188
- Accessing Gmail from the command line
- Gmail is a widely-used free e-mail service from Google : http://mail.google.com/ .
- Gmail allows you to read your mail via authenticated RSS feeds. We can parse the RSS feeds
- with the sender's name and an e-mail with subject. It will help to have a look at unread mails
- in the inbox without opening the web browser.
- How to do it...
- Let's go through the shell script to parse the RSS feeds for Gmail to display the unread mails:
- #!/bin/bash
- Filename: fetch_gmail.sh
- #Description: Fetch gmail tool
- username="PUT_USERNAME_HERE"
- password="PUT_PASSWORD_HERE"
- SHOW_COUNT=5 # No of recent unread mails to be shown
- echo
- curl -u $username:$password --silent "https://mail.google.com/mail/
- feed/atom" | \
- tr -d '\n' | sed 's:</entry>:\n:g' |\
- sed 's/.*<title>\(.*\)<\/title.*<author><name>\([^<]*\)<\/
- name><email>
- \([^<]*\).*/Author: \2 [\3] \nSubject: \1\n/' | \
- head -n $(( $SHOW_COUNT * 3 ))
- The output will be as follows:
- $ ./fetch_gmail.sh
- Author: SLYNUX [ slynux@slynux.com ]
- Subject: Book release - 2
- Author: SLYNUX [ slynux@slynux.com ]
- Subject: Book release - 1
- .
- … 5 entries
- How it works...
- The script uses cURL to download the RSS feed by using user authentication. User authentication
- is provided by the -u username:password argument. You can use -u user without providing
- the password. Then while executing cURL it will interactively ask for the password.
- www.it-ebooks.info
- Chapter 5
- 189
- Here we can split the piped commands into different blocks to illustrate how they work.
- tr -d '\n' removes the newline character so that we restructure each mail entry with \n
- as the delimiter. sed 's:</entry>:\n:g' replaces every </entry> with a newline so that
- each mail entry is delimited by a newline and hence mails can be parsed one by one. Have a
- look at the source of https://mail.google.com/mail/feed/atom for XML tags used in
- the RSS feeds. <entry> TAGS </entry> corresponds to a single mail entry.
- The next block of script is as follows:
- sed 's/.*<title>\(.*\)<\/title.*<author><name>\([^<]*\)<\/
- name><email>
- \([^<]*\).*/Author: \2 [\3] \nSubject: \1\n/'
- This script matches the substring title using <title>\(.*\)<\/title , the sender name
- using <author><name>\([^<]*\)<\/name> , and e-mail using <email>\([^<]*\) . Then
- back referencing is used as follows:
- f Author: \2 [\3] \nSubject: \1\n is used to replace an entry for a mail with
- the matched items in an easy-to-read format. \1 corresponds to the first substring
- match, \2 for the second substring match, and so on.
- f The SHOW_COUNT=5 variable is used to take the number of unread mail entries to be
- printed on terminal.
- f head is used to display only SHOW_COUNT*3 lines from the first line. SHOW_COUNT is
- used three times in order to show three lines of the output.
- See also
- f A primer on cURL, explains the curl command
- f Basic sed primer of Chapter 4, explains the sed command
- Parsing data from a website
- It is often useful to parse data from web pages by eliminating unnecessary details. sed and awk
- are the main tools that we will use for this task. You might have come across a list of access
- rankings in a grep recipe in the previous chapter Texting and driving; it was generated by parsing
- the website page http://www.johntorres.net/BoxOfficefemaleList.html .
- Let's see how to parse the same data using text-processing tools.
- www.it-ebooks.info
- Tangled Web? Not At All!
- 190
- How to do it...
- Let's go through the command sequence used to parse details of actresses from the website:
- $ lynx -dump http://www.johntorres.net/BoxOfficefemaleList.html | \ grep
- -o "Rank-.*" | \
- sed 's/Rank-//; s/\[[0-9]\+\]//' | \
- sort -nk 1 |\
- awk '
- {
- for(i=3;i<=NF;i++){ $2=$2" "$i }
- printf "%-4s %s\n", $1,$2 ;
- }' > actresslist.txt
- The output will be as follows:
- # Only 3 entries shown. All others omitted due to space limits
- 1 Keira Knightley
- 2 Natalie Portman
- 3 Monica Bellucci
- How it works...
- Lynx is a command-line web browser; it can dump the text version of the website as we
- would see in a web browser rather than showing us the raw code. Hence it avoids the job of
- removing the HTML tags. We parse the lines starting with Rank, using sed as follows:
- sed 's/Rank-//; s/\[[0-9]\+\]//'
- These lines could be then sorted according to the ranks. awk is used here to keep the spacing
- between rank and the name uniform by specifying the width. %-4s specifies a four-character
- width. All the fields except the first field are concatenated to form a single string as $2 .
- See also
- f Basic sed primer of Chapter 4, explains the sed command
- f Basic awk primer of Chapter 4, explains the awk command
- f Downloading a web page as formatted plain text, explains the lynx command
- www.it-ebooks.info
- Chapter 5
- 191
- Image crawler and downloader
- Image crawlers are very useful when we need to download all the images that appear in a web
- page. Instead of going through the HTML sources and picking all the images, we can use a
- script to parse the image files and download them automatically. Let's see how to do it.
- How to do it...
- Let's write a Bash script to crawl and download the images from a web page as follows:
- #!/bin/bash
- #Description: Images downloader
- #Filename: img_downloader.sh
- if [ $# -ne 3 ];
- then
- echo "Usage: $0 URL -d DIRECTORY"
- exit -1
- fi
- for i in {1..4}
- do
- case $1 in
- -d) shift; directory=$1; shift ;;
- *) url=${url:-$1}; shift;;
- esac
- done
- mkdir -p $directory;
- baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
- curl –s $url | egrep -o "<img src=[^>]*>" |
- sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
- sed -i "s|^/|$baseurl/|" /tmp/$$.list
- cd $directory;
- while read filename;
- do
- curl –s -O "$filename" --silent
- done < /tmp/$$.list
- An example usage is as follows:
- $ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images
- www.it-ebooks.info
- Tangled Web? Not At All!
- 192
- How it works...
- The above image downloader script parses an HTML page, strips out all tags except <img> ,
- then parses src="URL" from the <img> tag and downloads them to the specified directory.
- This script accepts a web page URL and the destination directory path as command-line
- arguments. The first part of the script is a tricky way to parse command-line arguments.
- The [ $# -ne 3 ] statement checks whether the total number of arguments to the script
- is three, else it exits and returns a usage example.
- If it is 3 arguments, then parse the URL and the destination directory. In order to do that a
- tricky hack is used:
- for i in {1..4}
- do
- case $1 in
- -d) shift; directory=$1; shift ;;
- *) url=${url:-$1}; shift;;
- esac
- done
- A for loop is iterated four times (there is no significance to the number four, it is just to iterate
- a couple of times to run the case statement).
- The case statement will evaluate the first argument ( $1 ), and matches -d or any other
- string arguments that are checked. We can place the -d argument anywhere in the format as
- follows:
- $ ./img_downloader.sh -d DIR URL
- Or:
- $ ./img_downloader.sh URL -d DIR
- shift is used to shift arguments such that when shift is called $1 will be assigned with
- $2 , when again called $1=$3 and so on as it shifts $1 to the next arguments. Hence we can
- evaluate all arguments through $1 itself.
- When -d is matched ( -d) ), it is obvious that the next argument is the value for the
- destination directory. *) corresponds to default match. It will match anything other than
- -d . Hence while iteration $1="" or $1=URL in the default match, we need to take $1=URL
- avoiding "" to overwrite. Hence we use the url=${url:-$1} trick. It will return a URL value
- if already not "" else it will assign $1 .
- egrep -o "<img src=[^>]*>" will print only the matching strings, which are the <img>
- tags including their attributes. [^>]* used to match all characters except the closing > , that
- is, <img src="image.jpg" …. > .
- www.it-ebooks.info
- Chapter 5
- 193
- sed 's/<img src=\"\([^"]*\).*/\1/g' parses src="url" so that all image URLs
- can be parsed from the <img> tags already parsed.
- There are two types of image source paths: relative and absolute. Absolute paths contain full
- URLs that start with http:// or https:// . Relative URLs starts with / or image_name itself.
- An example of an absolute URL is: http://example.com/image.jpg
- An example of a relative URL is: /image.jpg
- For relative URLs the starting / should be replaced with the base URL to transform it to
- http://example.com/image.jpg .
- For that transformation, we initially find out baseurl sed by parsing.
- Then replace every occurrence of the starting / with baseurl sed as sed -i
- "s|^/|$baseurl/|" /tmp/$$.list .
- Then a while loop is used to iterate the list line by line and download the URL using curl .
- The --silent argument is used with curl to avoid other progress messages from being
- printed on the screen.
- See also
- f A primer on cURL, explains the curl command
- f Basic sed primer of Chapter 4, explains the sed command
- f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
- command
- Web photo album generator
- Web developers commonly design photo album pages for websites that consist of a number
- of image thumbnails on the page. When thumbnails are clicked, a large version of the
- picture will be displayed. But when many images are required, copying the <img> tag every
- time, resizing the image to create a thumbnail, placing them in the thumbs directory, testing
- the links, and so on are real hurdles. It takes a lot of time and repeats the same task. It
- can be automated easily by writing a simple Bash script. By writing a script, we can create
- thumbnails, place them in exact directories, and generate the code fragment for <img> tags
- automatically in few seconds. This recipe will teach you how to do it.
- Getting ready
- We can perform this task with a for loop that iterates every image in the current directory.
- The usual Bash utilities such as cat and convert (image magick) are used. These will
- generate an HTML album, using all the images, to index.html . In order to use convert ,
- make sure you have Imagemagick installed.
- www.it-ebooks.info
- Tangled Web? Not At All!
- 194
- How to do it...
- Let's write a Bash script to generate a HTML album page:
- #!/bin/bash
- #Filename: generate_album.sh
- #Description: Create a photo album using images in current directory
- echo "Creating album.."
- mkdir -p thumbs
- cat <<EOF > index.html
- <html>
- <head>
- <style>
- body
- {
- width:470px;
- margin:auto;
- border: 1px dashed grey;
- padding:10px;
- }
- img
- {
- margin:5px;
- border: 1px solid black;
- }
- </style>
- </head>
- <body>
- <center><h1> #Album title </h1></center>
- <p>
- EOF
- for img in *.jpg;
- do
- convert "$img" -resize "100x" "thumbs/$img"
- echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" />
- </a>" >> index.html
- done
- cat <<EOF >> index.html
- </p>
- </body>
- </html>
- EOF
- echo Album generated to index.html
- www.it-ebooks.info
- Chapter 5
- 195
- Run the script as follows:
- $ ./generate_album.sh
- Creating album..
- Album generated to index.html
- How it works...
- The initial part of the script is to write the header part of the HTML page.
- The following script redirects all the contents up to EOF (excluding) to the index.html :
- cat <<EOF > index.html
- contents...
- EOF
- The header includes the HTML and stylesheets.
- for img in *.jpg; will iterate through names of each file and will perform actions.
- convert "$img" -resize "100x" "thumbs/$img" will create images of 100px width
- as thumbnails.
- The following statement will generate the required <img> tag and appends it to the index.html :
- echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" /></
- a>" >> index.html
- Finally, the footer HTML tags are appended with cat again.
- See also
- f Playing with file descriptors and redirection of Chapter 1, explains EOF and stdin
- redirection.
- Twitter command-line client
- Twitter is the hottest micro blogging platform as well as the latest buzz of online social media.
- Tweeting and reading tweets is fun. What if we can do both from command line? It is pretty
- simple to write a command-line Twitter client. Twitter has RSS feeds and hence we can make
- use of them. Let's see how to do it.
- Getting ready
- We can use cURL to authenticate and send twitter updates as well as download the RSS feed
- pages to parse the tweets. Just four lines of code can do it. Let's do it.
- www.it-ebooks.info
- Tangled Web? Not At All!
- 196
- How to do it...
- Let's write a Bash script using the curl command to manipulate twitter APIs:
- #!/bin/bash
- #Filename: tweets.sh
- #Description: Basic twitter client
- USERNAME="PUT_USERNAME_HERE"
- PASSWORD="PUT_PASSWORD_HERE"
- COUNT="PUT_NO_OF_TWEETS"
- if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];
- then
- echo -e "Usage: $0 send status_message\n OR\n $0 read\n"
- exit -1;
- fi
- if [[ "$1" = "read" ]];
- then
- curl --silent -u $USERNAME:$PASSWORD http://twitter.com/statuses/
- friends_timeline.rss | \
- grep title | \
- tail -n +2 | \
- head -n $COUNT | \
- sed 's:.*<title>\([^<]*\).*:\n\1:'
- elif [[ "$1" = "tweet" ]];
- then
- status=$( echo $@ | tr -d '"' | sed 's/.*tweet //')
- curl --silent -u $USERNAME:$PASSWORD -d status="$status" http://
- twitter.com/statuses/update.xml > /dev/null
- echo 'Tweeted :)'
- fi
- Run the script as follows:
- $ ./tweets.sh tweet Thinking of writing a X version of wall command
- "#bash"
- Tweeted :)
- $ ./tweets.sh read
- bot: A tweet line
- t3rm1n4l: Thinking of writing a X version of wall command #bash
- www.it-ebooks.info
- Chapter 5
- 197
- How it works...
- Let's see the working of above script by splitting it into two parts. The first part is
- about reading tweets. To read tweets the script downloads the RSS information from
- http://twitter.com/statuses/friends_timeline.rss and parses the lines
- containing the <title> tag. Then it strips off the <title> and </title> tags using sed
- to form the required tweet text. Then a COUNT variable is used to remove all other text except
- the number of recent tweets by using the head command. tail –n +2 is used to remove an
- unnecessary header text "Twitter: Timeline of friends".
- In the sending tweet part, the -d status argument of curl is used to post data to Twitter
- using their API: http://twitter.com/statuses/update.xml .
- $1 of the script will be the tweet in the case of sending a tweet. Then to obtain the status we
- take $@ (list of all arguments of the script) and remove the word "tweet" from it.
- See also
- f A primer on cURL, explains the curl command
- f head and tail - printing the last or first 10 lines of Chapter 3, explains the commands
- head and tail
- define utility with Web backend
- Google provides Web definitions for any word by using the search query define:WORD . We
- need a GUI web browser to fetch the definitions. However, we can automate it and parse the
- required definitions by using a script. Let's see how to do it.
- Getting ready
- We can use lynx , sed , awk , and grep to write the define utility.
- How to do it...
- Let's go through the code for the define utility script to fetch definitions from Google search:
- #!/bin/bash
- #Filename: define.sh
- #Description: A Google define: frontend
- limit=0
- if [ ! $# -ge 1 ];
- then
- echo -e "Usage: $0 WORD [-n No_of_definitions]\n"
- exit -1;
- www.it-ebooks.info
- Tangled Web? Not At All!
- 198
- fi
- if [ "$2" = "-n" ];
- then
- limit=$3;
- let limit++
- fi
- word=$1
- lynx -dump http://www.google.co.in/search?q=define:$word | \
- awk '/Defini/,/Find defini/' | head -n -1 | sed 's:*:\n*:; s:^[ ]*::'
- | \
- grep -v "[[0-9]]" | \
- awk '{
- if ( substr($0,1,1) == "*" )
- { sub("*",++count".") } ;
- print
- } ' > /tmp/$$.txt
- echo
- if [ $limit -ge 1 ];
- then
- cat /tmp/$$.txt | sed -n "/^1\./, /${limit}/p" | head -n -1
- else
- cat /tmp/$$.txt;
- fi
- Run the script as follows:
- $ ./define.sh hack -n 2
- 1. chop: cut with a hacking tool
- 2. one who works hard at boring tasks
- How it works...
- We will look into the core part of the definition parser. Lynx is used to obtain the plain text
- version of the web page. http://www.google.co.in/search?q=define:$word is
- the URL for the web definition web page. Then we reduce the text between "Definitions on
- web" and "Find definitions". All the definitions are occurring in between these lines of text
- ( awk '/Defini/,/Find defini/' ).
- www.it-ebooks.info
- Chapter 5
- 199
- 's:*:\n*:' is used to replace * with * and newline in order to insert a newline in between
- each definition, and s:^[ ]*:: is used to remove extra spaces in the start of lines. Hyperlinks
- are marked as [number] in lynx output. Those lines are removed by grep -v , the invert match
- lines option. Then awk is used to replace the * occurring at start of the line with a number so
- that each definition can assign a serial number. If we have read a -n count in the script, it has to
- output only a few definitions as per count. So awk is used to print the definitions with number 1
- to count (this makes it easier since we replaced * with the serial number).
- See also
- f Basic sed primer of Chapter 4, explains the sed command
- f Basic awk primer of Chapter 4, explains the awk command
- f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
- command
- f Downloading a web page as formatted plain text, explains the lynx command
- Finding broken links in a website
- I have seen people manually checking each and every page on a site to search for broken links.
- It is possible only for websites having very few pages. When the number of pages become large,
- it will become impossible. It becomes really easy if we can automate finding broken links. We
- can find the broken links by using HTTP manipulation tools. Let's see how to do it.
- Getting ready
- In order to identify the links and find the broken ones from the links, we can use lynx and
- curl . It has an option -traversal , which will recursively visit pages in the website and build
- the list of all hyperlinks in the website. We can use cURL to verify whether each of the links
- are broken or not.
- How to do it...
- Let's write a Bash script with the help of the curl command to find out the broken links on a
- web page:
- #!/bin/bash
- #Filename: find_broken.sh
- #Description: Find broken links in a website
- if [ $# -eq 2 ];
- then
- echo -e "$Usage $0 URL\n"
- exit -1;
- fi
- www.it-ebooks.info
- Tangled Web? Not At All!
- 200
- echo Broken links:
- mkdir /tmp/$$.lynx
- cd /tmp/$$.lynx
- lynx -traversal $1 > /dev/null
- count=0;
- sort -u reject.dat > links.txt
- while read link;
- do
- output=`curl -I $link -s | grep "HTTP/.*OK"`;
- if [[ -z $output ]];
- then
- echo $link;
- let count++
- fi
- done < links.txt
- [ $count -eq 0 ] && echo No broken links found.
- How it works...
- lynx -traversal URL will produce a number of files in the working directory. It includes
- a file reject.dat which will contain all the links in the website. sort -u is used to build a
- list by avoiding duplicates. Then we iterate through each link and check the header response
- by using curl -I . If the header contains first line HTTP/1.0 200 OK as the response, it
- means that the target is not broken. All other responses correspond to broken links and are
- printed out to stdout .
- See also
- f Downloading a web page as formatted plain text, explains the lynx command
- f A primer on cURL, explains the curl command
- Tracking changes to a website
- Tracking changes to a website is helpful to web developers and users. Checking a website
- manually in intervals is really hard and impractical. Hence we can write a change tracker
- running at repeated intervals. When a change occurs, it can play a sound or send a
- notification. Let's see how to write a basic tracker for the website changes.
- www.it-ebooks.info
- Chapter 5
- 201
- Getting ready
- Tracking changes in terms of Bash scripting means fetching websites at different times and
- taking the difference using the diff command. We can use curl and diff to do this.
- How to do it...
- Let's write a Bash script by combining different commands to track changes in a web page:
- #!/bin/bash
- #Filename: change_track.sh
- #Desc: Script to track changes to webpage
- if [ $# -eq 2 ];
- then
- echo -e "$Usage $0 URL\n"
- exit -1;
- fi
- first_time=0
- # Not first time
- if [ ! -e "last.html" ];
- then
- first_time=1
- # Set it is first time run
- fi
- curl --silent $1 -o recent.html
- if [ $first_time -ne 1 ];
- then
- changes=$(diff -u last.html recent.html)
- if [ -n "$changes" ];
- then
- echo -e "Changes:\n"
- echo "$changes"
- else
- echo -e "\nWebsite has no changes"
- fi
- else
- echo "[First run] Archiving.."
- fi
- cp recent.html last.html
- www.it-ebooks.info
- Tangled Web? Not At All!
- 202
- Let's look at the output of the track_changes.sh script when changes are made to the web
- page and when the changes are not made to the page:
- f First run:
- $ ./track_changes.sh http://web.sarathlakshman.info/test.html
- [First run] Archiving..
- f Second Run:
- $ ./track_changes.sh http://web.sarathlakshman.info/test.html
- Website has no changes
- f Third run after making changes to the web page:
- $ ./test.sh http://web.sarathlakshman.info/test_change/test.html
- Changes:
- --- last.html 2010-08-01 07:29:15.000000000 +0200
- +++ recent.html 2010-08-01 07:29:43.000000000 +0200
- @@ -1,3 +1,4 @@
- <html>
- +added line :)
- <p>data</p>
- </html>
- How it works...
- The script checks whether the script is running for the first time using [ ! -e "last.html"
- ]; . If last.html doesn't exist, that means it is the first time and hence the webpage must
- be downloaded and copied as last.html .
- If it is not the first time, it should download the new copy ( recent.html ) and check the
- difference using the diff utility. If changes are there, it should print the changes and finally it
- should copy recent.html to last.html .
- See also
- f A primer on cURL, explains the curl command
- www.it-ebooks.info
- Chapter 5
- 203
- Posting to a web page and reading response
- POST and GET are two types of requests in HTTP to send information to or retrieve information
- from a website. In a GET request, we send parameters (name-value pairs) through the web
- page URL itself. In the case of POST, it won't be attached with the URL. POST is used when a
- form needs to be submitted. For example, a username, the password to be submitted, and the
- login page to be retrieved.
- POSTing to pages comes as frequent use while writing scripts based on web page retrievals.
- Let's see how to work with POST. Automating the HTTP GET and POST request by sending
- POST data and retrieving output is a very important task that we practice while writing shell
- scripts that parse data from websites.
- Getting ready
- Both cURL and wget can handle POST requests by arguments. They are to be passed as
- name-value pairs.
- How to do it...
- Let's see how to POST and read HTML response from a real website using curl :
- $ curl URL -d "postvar=postdata2&postvar2=postdata2"
- We have a website ( http://book.sarathlakshman.com/lsc/mlogs/ ) and it is used
- to submit the current user information such as hostname and username. Assume that, in
- the home page of the website there are two fields HOSTNAME and USER, and a SUBMIT
- button. When the user enters a hostname, a user name, and clicks on the SUBMIT button,
- the details will be stored in the website. This process can be automated using a single line of
- curl command by automating the POST request. If you look at the website source (use the
- view source option from the web browser), you can see an HTML form defined similar to the
- following code:
- <form action="http://book.sarathlakshman.com/lsc/mlogs/submit.php"
- method="post" >
- <input type="text" name="host" value="HOSTNAME" >
- <input type="text" name="user" value="USER" >
- <input type="submit" >
- </form>
- Here, http://book.sarathlakshman.com/lsc/mlogs/submit.php is the target
- URL. When the user enters the details and clicks on the Submit button. The host and user
- inputs are sent to submit.php as a POST request and the response page is returned on the
- browser.
- www.it-ebooks.info
- Tangled Web? Not At All!
- 204
- We can automate the POST request as follows:
- $ curl http://book.sarathlakshman.com/lsc/mlogs/submit.php -d "host=test-
- host&user=slynux"
- <html>
- You have entered :
- <p>HOST : test-host</p>
- <p>USER : slynux</p>
- <html>
- Now curl returns the response page.
- -d is the argument used for posting. The string argument for -d is similar to the GET request
- semantics. var=value pairs are to be delimited by & .
- The -d argument should always be given in quotes. If quotes are not used, &
- is interpreted by the shell to indicate this should be a background process.
- There's more
- Let's see how to perform POST using cURL and wget .
- POST in curl
- You can POST data in curl by using -d or –data as follows:
- $ curl –-data "name=value" URL -o output.html
- If multiple variables are to be sent, delimit them with & . Note that when & is used the
- name-value pairs should be enclosed in quotes, else the shell will consider & as a special
- character for background process. For example:
- $ curl -d "name1=val1&name2=val2" URL -o output.html
- POST data using wget
- You can POST data using wget by using -–post-data "string" . For example:
- $ wget URL –post-data "name=value" -O output.html
- Use the same format as cURL for name-value pairs.
- See also
- f A primer on cURL, explains the curl command
- f Downloading from a web page explains the wget command
- www.it-ebooks.info
- 6
- The Backup Plan
- In this chapter, we will cover:
- f Archiving with tar
- f Archiving with cpio
- f Compressing with gunzip (gzip)
- f Compressing with bunzip (bzip)
- f Compressing with lzma
- f Archiving and compressing with zip
- f Heavy compression squashfs fileystem
- f Encrypting files and folders (with standard algorithms)
- f Backup snapshots with rsync
- f Version controlled backups with git
- f Cloning disks with dd
- Introduction
- Taking snapshots and backups of data are regular tasks we come across. When it comes
- to a server or large data storage systems, regular backups are important. It is possible
- to automate backups via shell scripting. Archiving and compression seems to find usage
- in the everyday life of a system admin or a regular user. There are various compression
- formats that can be used in various ways so that best results can be obtained. Encryption is
- another task that comes under frequent usage for protection of data. In order to reduce the
- size of encrypted data, usually files are archived and compressed before encrypting. Many
- standard encryption algorithms are available and it can be handled with shell utilities. This
- chapter walks through different recipes for creating and maintaining files or folder archives,
- compression formats, and encrypting techniques with shell. Let's go through the recipes.
- www.it-ebooks.info
- The Backup Plan
- 206
- Archiving with tar
- The tar command can be used to archive files. It was originally designed for storing data on
- tape archives (tar). It allows you to store multiple files and directories as a single file. It can
- retain all the file attributes, such as owner, permissions, and so on. The file created by the tar
- command is often referred to as a tarball.
- Getting ready
- The tar command comes by default with all UNIX like operating systems. It has a simple
- syntax and is a portable file format. Let's see how to do it.
- tar has got a list of arguments: A , c , d , r , t , u , x , f , and v . Each of these letters can be used
- independently for different purposes corresponding to it.
- How to do it...
- To archive files with tar, use the following syntax:
- $ tar -cf output.tar [SOURCES]
- For example:
- $ tar -cf output.tar file1 file2 file3 folder1 ..
- In this command, -c stands for "create file" and –f stands for "specify filename".
- We can specify folders and filenames as SOURCES . We can use a list of file names or
- wildcards such as *.txt to specify the sources.
- It will archive the source files into a file called output.tar .
- The filename must appear immediately after the –f and should be the last option in the
- argument group (for example, -cvvf filename.tar and -tvvf filename.tar ).
- We cannot pass hundreds of files or folders as command-line arguments because there is a
- limit. So it is safer to use the append option if many files are to be archived.
- There's more...
- Let's go through additional features that are available with the tar command.
- Appending files to an archive
- Sometimes we may need to add files to an archive that already exists (an example usage is
- when thousands of files are to be archived and when they cannot be specified in one line as
- command-line arguments).
- www.it-ebooks.info
- Chapter 6
- 207
- Append option: -r
- In order to append a file into an already existing archive use:
- $ tar -rvf original.tar new_file
- List the files in an archive as follows:
- $ tar -tf archive.tar
- yy/lib64/
- yy/lib64/libfakeroot/
- yy/sbin/
- In order to print more details while archiving or listing, use the -v or the –vv flag. These flags
- are called verbose ( v ), which will enable to print more details on the terminal. For example,
- by using verbose you could print more details, such as the file permissions, owner group,
- modification date, and so on.
- For example:
- $ tar -tvvf archive.tar
- drwxr-xr-x slynux/slynux 0 2010-08-06 09:31 yy/
- drwxr-xr-x slynux/slynux 0 2010-08-06 09:39 yy/usr/
- drwxr-xr-x slynux/slynux 0 2010-08-06 09:31 yy/usr/lib64/
- Extracting files and folders from an archive
- The following command extracts the contents of the archive to the current directory:
- $ tar -xf archive.tar
- The -x option stands for extract.
- When –x is used, the tar command extracts the contents of the archive to the current
- directory. We can also specify the directory where the files need to be extracted by using the
- –C flag, as follows:
- $ tar -xf archive.tar -C /path/to/extraction_directory
- The command extracts the contents of an archive to insert image a specified directory. It
- extracts the entire contents of the archive. We can also extract only a few files by specifying
- them as command arguments:
- $ tar -xvf file.tar file1 file4
- The command above extracts only file1 and file4 , and ignores other files in the archive.
- www.it-ebooks.info
- The Backup Plan
- 208
- stdin and stdout with tar
- While archiving, we can specify stdout as the output file so that another command appearing
- through a pipe can read it as stdin and then do some process or extract the archive.
- This is helpful in order to transfer data through a Secure Shell (SSH) connection (while on a
- network). For example:
- $ mkdir ~/destination
- $ tar -cf - file1 file2 file3 | tar -xvf - -C ~/destination
- In the example above, file1 , file2 , and file3 are combined into a tarball and then
- extracted to ~/destination . In this command:
- f -f specifies stdout as the file for archiving (when the -c option used)
- f -f specifies stdin as the file for extracting (when the -x option used)
- Concatenating two archives
- We can easily merge multiple tar files with the -A option.
- Let's pretend we have two tarballs: file1.tar and file2.tar . We can merge the contents
- of file2.tar to file1.tar as follows:
- $ tar -Af file1.tar file2.tar
- Verify it by listing the contents:
- $ tar -tvf file1.tar
- Updating files in an archive with timestamp check
- The append option appends any given file to the archive. If the same file is inside the archive
- is given to append, it will append that file and the archive will contain duplicates. We can
- use the update option -u to specify only append files that are newer than the file inside the
- archive with the same name.
- $ tar -tf archive.tar
- filea
- fileb
- filec
- This command lists the files in the archive.
- In order to append filea only if filea has newer modification time than filea inside
- archive.tar , use:
- $ tar -uvvf archive.tar filea
- www.it-ebooks.info
- Chapter 6
- 209
- Nothing happens if the version of filea outside the archive and the filea inside
- archive.tar have the same timestamp.
- Use the touch command to modify the file timestamp and then try the tar command again:
- $ tar -uvvf archive.tar filea
- -rw-r--r-- slynux/slynux 0 2010-08-14 17:53 filea
- The file is appended since its timestamp is newer than the one inside the archive.
- Comparing files in archive and file system
- Sometimes it is useful to know whether a file in the archive and a file with the same filename
- in the filesystem are the same or contain any differences. The –d flag can be used to print the
- differences:
- $ tar -df archive.tar filename1 filename2 ...
- For example:
- $ tar -df archive.tar afile bfile
- afile: Mod time differs
- afile: Size differs
- Deleting files from archive
- We can remove files from a given archive using the –delete option. For example:
- $ tar -f archive.tar --delete file1 file2 ..
- Let's see another example:
- $ tar -tf archive.tar
- filea
- fileb
- filec
- Or, we can also use the following syntax:
- $ tar --delete --file archive.tar [FILE LIST]
- For example:
- $ tar --delete --file archive.tar filea
- $ tar -tf archive.tar
- fileb
- filec
- www.it-ebooks.info
- The Backup Plan
- 210
- Compression with tar archive
- The tar command only archives files, it does not compress them. For this reason, most people
- usually add some form of compression when working with tarballs. This significantly decreases
- the size of the files. Tarballs are often compressed into one of the following formats:
- f file.tar.gz
- f file.tar.bz2
- f file.tar.lzma
- f file.tar.lzo
- Different tar flags are used to specify different compression formats.
- f -j for bunzip2
- f -z for gzip
- f --lzma for lzma
- They are explained in the following compression-specific recipes.
- It is possible to use compression formats without explicitly specifying special options as
- above. tar can compress by looking at the given extension of the output or input file names.
- In order for tar to support compression automatically by looking at the extensions, use -a or
- --auto-compress with tar .
- Excluding a set of files from archiving
- It is possible to exclude a set of files from archiving by specifying patterns. Use
- --exclude [PATTERN] for excluding files matched by wildcard patterns.
- For example, to exclude all .txt files from archiving use:
- $ tar -cf arch.tar * --exclude "*.txt"
- Note that the pattern should be enclosed in double quotes.
- It is also possible to exclude a list of files provided in a list file with the -X flag as follows:
- $ cat list
- filea
- fileb
- $ tar -cf arch.tar * -X list
- Now it excludes filea and fileb from archiving.
- www.it-ebooks.info
- Chapter 6
- 211
- Excluding version control directories
- We usually use tarballs for distributing source code. Most of the source code is maintained
- using version control systems such as subversion, Git, mercurial, cvs, and so on. Code
- directories under version control will contain special directories used to manage versions like
- .svn or .git . However, these directories aren't needed by the code itself and so should be
- eliminated from the tarball of the source code.
- In order to exclude version control related files and directories while archiving use the
- --exclude-vcs option along with tar . For example:
- $ tar --exclude-vcs -czvvf source_code.tar.gz eye_of_gnome_svn
- Printing total bytes
- It is sometimes useful if we can print total bytes copied to the archive. Print the total bytes
- copied after archiving by using the -- totals option as follows:
- $ tar -cf arc.tar * --exclude "*.txt" --totals
- Total bytes written: 20480 (20KiB, 12MiB/s)
- See also
- f Compressing with gunzip (gzip), explains the gzip command
- f Compressing with bunzip (bzip2), explains the bzip2 command
- f Compressing with lzma, explains the lzma command
- Archiving with cpio
- cpio is another archiving format similar to tar . It is used to store files and directories in a file
- with attributes such as permissions, ownership, and so on. But it is not commonly used as
- much as tar . However, cpio seems to be used in RPM package archives, initramfs files for
- the Linux kernel, and so on. This recipe will give minimal usage examples of cpio .
- How to do it...
- cpio takes input filenames through stdin and it writes the archive into stdout . We have to
- redirect stdout to a file to receive the output cpio file as follows:
- Create test files:
- $ touch file1 file2 file3
- We can archive the test files as follows:
- $ echo file1 file2 file3 | cpio -ov > archive.cpio
- www.it-ebooks.info
- The Backup Plan
- 212
- In this command:
- f -o specifies the output
- f -v is used for printing a list of files archived
- By using cpio, we can also archive using files as absolute paths. /usr/
- somedir is an absolute path as it contains the full path starting from root (/).
- A relative path will not start with / but it starts the path from the current
- directory. For example, test/file means that there is a directory test and
- the file is inside the test directory.
- While extracting, cpio extracts to the absolute path itself. But incase of tar it
- removes the / in the absolute path and converts it as relative path.
- In order to list files in a cpio archive use the following command:
- $ cpio -it < archive.cpio
- This command will list all the files in the given cpio archive. It reads the files from stdin .
- In this command:
- f -i is for specifying the input
- f -t is for listing
- In order to extract files from the cpio archive use:
- $ cpio -id < archive.cpio
- Here, -d is used for extracting.
- It overwrites files without prompting. If the absolute path files are present in the archive, it will
- replace the files at that path. It will not extract files in the current directory like tar .
- Compressing with gunzip (gzip)
- gzip is a commonly used compression format in GNU/Linux platforms. Utilities such as gzip ,
- gunzip , and zcat are available to handle gzip compression file types. gzip can be applied
- on a file only. It cannot archive directories and multiple files. Hence we use a tar archive
- and compress it with gzip . When multiple files are given as input it will produce several
- individually compressed ( .gz ) files. Let's see how to operate with gzip .
- How to do it...
- In order to compress a file with gzip use the following command:
- $ gzip filename
- www.it-ebooks.info
- Chapter 6
- 213
- $ ls
- filename.gz
- Then it will remove the file and produce a compressed file called filename.gz .
- Extract a gzip compressed file as follows:
- $ gunzip filename.gz
- It will remove filename.gz and produce an uncompressed version of filename.gz .
- In order to list out the properties of a compressed file use:
- $ gzip -l test.txt.gz
- compressed uncompressed ratio uncompressed_name
- 35 6 -33.3% test.txt
- The gzip command can read a file from stdin and also write a compressed file into
- stdout .
- Read from stdin and out as stdout as follows:
- $ cat file | gzip -c > file.gz
- The -c option is used to specify output to stdout .
- We can specify the compression level for gzip . Use --fast or the --best option to provide
- low and high compression ratios, respectively.
- There's more...
- The gzip command is often used with other commands. It also has advanced options to
- specify the compression ratio. Let's see how to work with these features.
- Gzip with tarball
- We usually use gzip with tarballs. A tarball can be compressed by using the –z option passed
- to the tar command while archiving and extracting.
- You can create gzipped tarballs using the following methods:
- f Method - 1
- $ tar -czvvf archive.tar.gz [FILES]
- Or:
- $ tar -cavvf archive.tar.gz [FILES]
- The -a option specifies that the compression format should automatically be
- detected from the extension.
- www.it-ebooks.info
- The Backup Plan
- 214
- f Method - 2
- First, create a tarball:
- $ tar -cvvf archive.tar [FILES]
- Compress it after tarballing as follows:
- $ gzip archive.tar
- If many files (a few hundreds) are to be archived in a tarball and need to be compressed, we
- use Method - 2 with few changes. The issue with giving many files as command arguments
- to tar is that it can accept only a limited number of files from the command line. In order
- to solve this issue, we can create a tar file by adding files one by one using a loop with an
- append option ( -r ) as follows:
- FILE_LIST="file1 file2 file3 file4 file5"
- for f in $FILE_LIST;
- do
- tar -rvf archive.tar $f
- done
- gzip archive.tar
- In order to extract a gzipped tarball, use the following:
- f -x for extraction
- f -z for gzip specification
- Or:
- $ tar -xavvf archive.tar.gz -C extract_directory
- In the above command, the -a option is used to detect the compression format automatically.
- zcat – reading gzipped files without extracting
- zcat is a command that can be used to dump an extracted file from a .gz file to stdout
- without manually extracting it. The .gz file remains as before but it will dump the extracted
- file into stdout as follows:
- $ ls
- test.gz
- $ zcat test.gz
- A test file
- # file test contains a line "A test file"
- $ ls
- test.gz
- www.it-ebooks.info
- Chapter 6
- 215
- Compression ratio
- We can specify compression ratio, which is available in range 1 to 9, where:
- f 1 is the lowest, but fastest
- f 9 is the best, but slowest
- You can also specify the ratios in between as follows:
- $ gzip -9 test.img
- This will compress the file to the maximum.
- See also
- f Archiving with tar, explains the tar command
- Compressing with bunzip (bzip)
- bunzip2 is another compression technique which is very similar to gzip . bzip2 typically
- produces smaller (more compressed) files than gzip . It comes with all Linux distributions.
- Let's see how to use bzip2 .
- How to do it...
- In order to compress with bzip2 use:
- $ bzip2 filename
- $ ls
- filename.bz2
- Then it will remove the file and produce a compressed file called filename.bzip2 .
- Extract a bzipped file as follows:
- $ bunzip2 filename.bz2
- It will remove filename.bz2 and produce an uncompressed version of filename .
- bzip2 can read a file from stdin and also write a compressed file into stdout .
- In order to read from stdin and read out as stdout use:
- $ cat file | bzip2 -c > file.tar.bz2
- -c is used to specify output to stdout .
- www.it-ebooks.info
- The Backup Plan
- 216
- We usually use bzip2 with tarballs. A tarball can be compressed by using the -j option
- passed to the tar command while archiving and extracting.
- Creating a bzipped tarball can be done by using the following methods:
- f Method - 1
- $ tar -cjvvf archive.tar.bz2 [FILES]
- Or:
- $ tar -cavvf archive.tar.bz2 [FILES]
- The -a option specifies to automatically detect compression format from the extension.
- f Method - 2
- First create the tarball:
- $ tar -cvvf archive.tar [FILES]
- Compress it after tarballing:
- $ bzip2 archive.tar
- If we need to add hundreds of files to the archive, the above commands may fail. To fix that
- issue, use a loop to append files to the archive one by one using the –r option. See the similar
- section from the recipe, Compressing with gunzip (gzip).
- Extract a bzipped tarball as follows:
- $ tar -xjvvf archive.tar.bz2 -C extract_directory
- In this command:
- f -x is used for extraction
- f -j is for bzip2 specification
- f -C is for specifying the directory to which the files are to be extracted
- Or, you can use the following command:
- $ tar -xavvf archive.tar.bz2 -C extract_directory
- -a will automatically detect the compression format.
- There's more...
- bunzip has several additional options to carry out different functions. Let's go through few
- of them.
- Keeping input files without removing them
- While using bzip2 or bunzip2 , it will remove the input file and produce a compressed output
- file. But we can prevent it from removing input files by using the –k option.
- www.it-ebooks.info
- Chapter 6
- 217
- For example:
- $ bunzip2 test.bz2 -k
- $ ls
- test test.bz2
- Compression ratio
- We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the
- least compression, but fast, and 9 is the highest possible compression but much slower).
- For example:
- $ bzip2 -9 test.img
- This command provides maximum compression.
- See also
- f Archiving with tar, explains the tar command
- Compressing with lzma
- lzma is comparatively new when compared to gzip or bzip2 . lzma offers better
- compression rates than gzip or bzip2 . As lzma is not preinstalled on most Linux distros,
- you may need to install it using the package manager.
- How to do it...
- In order to compress with lzma use the following command:
- $ lzma filename
- $ ls
- filename.lzma
- This will remove the file and produce a compressed file called filename.lzma .
- To extract an lzma file use:
- $ unlzma filename.lzma
- This will remove filename.lzma and produce an uncompressed version of the file.
- The lzma command can also read a file from stdin and write the compressed file to stdout .
- www.it-ebooks.info
- The Backup Plan
- 218
- In order to read from stdin and read out as stdout use:
- $ cat file | lzma -c > file.lzma
- -c is used to specify output to stdout .
- We usually use lzma with tarballs. A tarball can be compressed by using the --lzma option
- passed to the tar command while archiving and extracting.
- There are two methods to create a lzma tarball:
- f Method - 1
- $ tar -cvvf --lzma archive.tar.lzma [FILES]
- Or:
- $ tar -cavvf archive.tar.lzma [FILES]
- The -a option specifies to automatically detect the compression format from the
- extension.
- f Method - 2
- First, create the tarball:
- $ tar -cvvf archive.tar [FILES]
- Compress it after tarballing:
- $ lzma archive.tar
- If we need to add hundreds of files to the archive, the above commands may fail. To fix that
- issue, use a loop to append files to the archive one by one using the –r option. See the
- similar section from the recipe, Compressing with gunzip (gzip).
- There's more...
- Let's go through additional options associated with lzma utilities
- Extracting an lzma tarball
- In order to extract a tarball compressed with lzma compression to a specified directory, use:
- $ tar -xvvf --lzma archive.tar.lzma -C extract_directory
- In this command, -x is used for extraction. --lzma specifies the use of lzma to
- decompress the resulting file.
- Or, we could also use:
- $ tar -xavvf archive.tar.lzma -C extract_directory
- The -a option specifies to automatically detect the compression format from the extension.
- www.it-ebooks.info
- Chapter 6
- 219
- Keeping input files without removing them
- While using lzma or unlzma , it will remove the input file and produce an output file. But we
- can prevent from removing input files and keep them by using the -k option. For example:
- $ lzma test.bz2 -k
- $ ls
- test.bz2.lzma
- Compression ratio
- We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the
- least compression, but fast, and 9 is the highest possible compression but much slower).
- You can also specify ratios in between as follows:
- $ lzma -9 test.img
- This command compresses the file to the maximum.
- See also
- f Archiving with tar, explains the tar command
- Archiving and compressing with zip
- ZIP is a popular compression format used on many platforms. It isn't as commonly used as
- gzip or bzip2 on Linux platforms, but files from the Internet are often saved in this format.
- How to do it...
- In order to archive with ZIP, the following syntax is used:
- $ zip archive_name.zip [SOURCE FILES/DIRS]
- For example:
- $ zip file.zip file
- Here, the file.zip file will be produced.
- Archive directories and files recursively as follows:
- $ zip -r archive.zip folder1 file2
- In this command, -r is used for specifying recursive.
- www.it-ebooks.info
- The Backup Plan
- 220
- Unlike lzma , gzip , or bzip2 , zip won't remove the source file after archiving. zip is similar
- to tar in that respect, but zip can compress files where tar does not. However, zip adds
- compression too.
- In order to extract files and folders in a ZIP file, use:
- $ unzip file.zip
- It will extract the files without removing filename.zip (unlike unlzma or gunzip ).
- In order to update files in the archive with newer files in the filesystem, use the -u flag:
- $ zip file.zip -u newfile
- Delete a file from a zipped archive, by using –d as follows:
- $ zip -d arc.zip file.txt
- In order to list the files in an archive use:
- $ unzip -l archive.zip
- squashfs – the heavy compression filesystem
- squashfs is a heavy-compression based read-only filesystem that is capable of compressing
- 2 to 3GB of data onto a 700 MB file. Have you ever thought of how Linux Live CDs work?
- When a Live CD is booted it loads a complete Linux environment. Linux Live CDs make use
- of a read-only compressed filesystem called squashfs. It keeps the root filesystem on a
- compressed filesystem file. It can be loopback mounted and files can be accessed. Thus when
- some files are required by processes, they are decompressed and loaded onto the RAM and
- used. Knowledge of squashfs can be useful when building a custom live OS or when required
- to keep files heavily compressed and to access them without entirely extracting the files.
- For extracting a large compressed file, it will take a long time. However, if a file is loopback
- mounted, it will be very fast since the required portion of the compressed files are only
- decompressed when the request for files appear. In regular decompression, all the data is
- decompressed first. Let's see how we can use squashfs.
- Getting ready
- If you have an Ubuntu CD just locate a .squashfs file at CDRom ROOT/casper/
- filesystem.squashfs . squashfs internally uses compression algorithms such as gzip
- and lzma . squashfs support is available in all of the latest Linux distros. However, in order
- to create squashfs files, an additional package squashfs-tools needs to be installed from
- package manager.
- www.it-ebooks.info
- Chapter 6
- 221
- How to do it...
- In order to create a squashfs file by adding source directories and files, use:
- $ mksquashfs SOURCES compressedfs.squashfs
- Sources can be wildcards, or file, or folder paths.
- For example:
- $ sudo mksquashfs /etc test.squashfs
- Parallel mksquashfs: Using 2 processors
- Creating 4.0 filesystem on test.squashfs, block size 131072.
- [=======================================] 1867/1867 100%
- More details will be printed on terminal. They are limited to save space
- In order to mount the squashfs file to a mount point, use loopback mounting as follows:
- # mkdir /mnt/squash
- # mount -o loop compressedfs.squashfs /mnt/squash
- You can copy contents by accessing /mnt/squashfs .
- There's more...
- The squashfs file system can be created by specifying additional parameters. Let's go
- through the additional options.
- Excluding files while creating a squashfs file
- While creating a squashfs file, we can exclude a list of files or a file pattern specified using
- wildcards.
- Exclude a list of files specified as command-line arguments by using the -e option. For
- example:
- $ sudo mksquashfs /etc test.squashfs -e /etc/passwd /etc/shadow
- The –e option is used to exclude passwd and shadow files.
- It is also possible to specify a list of exclude files given in a file with –ef as follows:
- $ cat excludelist
- /etc/passwd
- /etc/shadow
- $ sudo mksquashfs /etc test.squashfs -ef excludelist
- If we want to support wildcards in excludes lists, use -wildcard as an argument.
- www.it-ebooks.info
- The Backup Plan
- 222
- Cryptographic tools and hashes
- Encryption techniques are used mainly to protect data from unauthorized access. There are
- many algorithms available and we use a common set of standard algorithms. There are a few
- tools available in a Linux environment for performing encryption and decryption. Sometimes
- we use encryption algorithm hashes for verifying data integrity. This section will introduce a few
- commonly-used cryptographic tools and a general set of algorithms that these tools can handle.
- How to do it...
- Let's see how to use the tools such as crypt, gpg, base64, md5sum, sha1sum, and openssl:
- f crypt
- The crypt command is a simple cryptographic utility, which takes a file from stdin
- and a passphrase as input and outputs encrypted data into stdout .
- $ crypt <input_file> output_file
- Enter passphrase:
- It will interactively ask for a passphrase. We can also provide a passphrase through
- command-line arguments.
- $ crypt PASSPHRASE < input_file > encrypted_file
- In order to decrypt the file use:
- $ crypt PASSPHRASE -d < encrypted_file > output_file
- f gpg (GNU privacy guard)
- gpg (GNU privacy guard) is a widely-used encryption scheme used for protecting files
- with key signing techniques that enables to access data by authentic destination only.
- gpg signatures are very famous. The details of gpg are outside the scope of this book.
- Here we can learn how to encrypt and decrypt a file.
- In order to encrypt a file with gpg use:
- $ gpg -c filename
- This command reads the passphrase interactively and generates filename.gpg .
- In order to decrypt a gpg file use:
- $ gpg filename.gpg
- This command reads a passphrase and decrypts the file.
- f Base64
- Base64 is a group of similar encoding schemes that represents binary data in an
- ASCII string format by translating it into a radix-64 representation. The base64
- command can be used to encode and decode the Base64 string.
- www.it-ebooks.info
- Chapter 6
- 223
- In order to encode a binary file into Base64 format, use:
- $ base64 filename > outputfile
- Or:
- $ cat file | base64 > outputfile
- It can read from stdin .
- Decode Base64 data as follows:
- $ base64 -d file > outputfile
- Or:
- $ cat base64_file | base64 -d > outputfile
- f md5sum and sha1sum
- md5sum and sha1sum are unidirectional hash algorithms, which cannot be reversed
- to form the original data. These are usually used to verify the integrity of data or for
- generating a unique key from a given data. For every file it generates a unique key by
- analyzing its content.
- $ md5sum file
- 8503063d5488c3080d4800ff50850dc9 file
- $ sha1sum file
- 1ba02b66e2e557fede8f61b7df282cd0a27b816b file
- These types of hashes are ideal for storing passwords. Passwords are stored as its
- hashes. When a user wants to authenticate, the password is read and converted to
- the hash. Then hash is compared to the one that is stored already. If they are same,
- the password is authenticated and access is provided, else it is denied. Storing
- original password strings is risky and poses a security risk of exposing the password.
- f Shadowlike hash (salted hash)
- Let's see how to generate shadow like salted hash for passwords.
- The user passwords in Linux are stored as its hashes in the /etc/shadow file. A
- typical line in /etc/shadow will look like this:
- test:$6$fG4eWdUi$ohTKOlEUzNk77.4S8MrYe07NTRV4M3LrJnZP9p.qc1bR5c.
- EcOruzPXfEu1uloBFUa18ENRH7F70zhodas3cR.:14790:0:99999:7:::
- In this line $6$fG4eWdUi$ohTKOlEUzNk77.4S8MrYe07NTRV4M3LrJnZP9p.
- qc1bR5c.EcOruzPXfEu1uloBFUa18ENRH7F70zhodas3cR is the shadow hash
- corresponding to its password.
- In some situations, we may need to write critical administration scripts that may need
- to edit passwords or add users manually using a shell script. In that case we have to
- generate a shadow password string and write a similar line as above to the shadow
- file. Let's see how to generate a shadow password using openssl .
- www.it-ebooks.info
- The Backup Plan
- 224
- Shadow passwords are usually salted passwords. SALT is an extra string used to
- obfuscate and make the encryption stronger. The salt consists of random bits that are
- used as one of the inputs to a key derivation function that generates the salted hash
- for the password.
- For more details on salt, see the Wikipedia page http://en.wikipedia.org/
- wiki/Salt_(cryptography) .
- $ openssl passwd -1 -salt SALT_STRING PASSWORD
- $1$SALT_STRING$323VkWkSLHuhbt1zkSsUG.
- Replace SALT_STRING with a random string and PASSWORD with the password you
- want to use.
- Backup snapshots with rsync
- Backing up data is something that most sysadmins need to do regularly. We may need to
- backup data in a web server or from remote locations. rsync is a command that can be
- used to synchronize files and directories from one location to another while minimizing data
- transfer using file difference calculations and compression. The advantage of rsync over the
- cp command is that rsync uses strong difference algorithms. Also, it supports data transfer
- across networks. While making copies, it compares the files in the original and destination
- locations and will only copy the files that are newer. It also supports compression, encryption,
- and a lot more. Let's see how we can work with rsync .
- How to do it...
- In order to copy a source directory to a destination (to create a mirror) use:
- $ rsync -av source_path destination_path
- In this command:
- f -a stands for archiving
- f -v (verbose) prints the details or progress on stdout
- The above command will recursively copy all the files from the source path to the destination
- path. We can specify paths as remote or localhost paths.
- It can be in the format /home/slynux/data , slynux@192.168.0.6:/home/backups/
- data , and so on.
- /home/slynux/data specifies the absolute path in the machine in which the rsync
- command is executed. slynux@192.168.0.6:/home/backups/data specifies that the
- path is /home/backups/data in the machine with IP address 192.168.0.6 and is logged
- in as user slynux .
- www.it-ebooks.info
- Chapter 6
- 225
- In order to back up data to a remote server or host, use:
- $ rsync -av source_dir username@host:PATH
- To keep a mirror at the destination, run the same rsync command scheduled at regular
- intervals. It will copy only changed files to the destination.
- Restore the data from remote host to localhost as follows:
- $ rsync -av username@host:PATH destination
- The rsync command uses SSH to connect to another remote machine. Provide the remote
- machine address in the format user@host , where user is the username and host is the IP
- address or domain name attached to the remote machine. PATH is the absolute path address
- where the data needs to be copied. rsync will ask for the user password as usual for SSH
- logic. This can be automated (avoid user password probing) by using SSH keys.
- Make sure that the OpenSSH is installed and running on the remote machine.
- Compressing data while transferring through the network can significantly optimize the
- speed of the transfer. We can use the rsync option –z to specify to compress data while
- transferring through a network. For example:
- $ rsync -avz source destination
- For the PATH format, if we use / at the end of the source, rsync will copy
- contents of that end directory specified in the source_path to the destination.
- If / not at the end of the source, rsync will copy that end directory itself to the
- destination.
- For example, the following command copies the content of the test directory:
- $ rsync -av /home/test/ /home/backups
- The following command copies the test directory to the destination:
- $ rsync -av /home/test /home/backups
- If / is at the end of destination_path, rsync will copy the source to the
- destination directory.
- If / is not used at the end of the destination path, rsync will create a folder,
- named similar to the source directory, at the end of the destination path and
- copy the source into that directory.
- For example:
- $ rsync -av /home/test /home/backups/
- www.it-ebooks.info
- The Backup Plan
- 226
- This command copies the source ( /home/test ) to an existing folder called backups .
- $ rsync -av /home/test /home/backups
- This command copies the source ( /home/test ) to a directory named backups by creating
- that directory.
- There's more...
- The rsync command has several additional functionalities that can be specified using its
- command-line options. Let's go through them.
- Excluding files while archiving with rsync
- Some files need not be updated while archiving to a remote location. It is possible to tell rsync
- to exclude certain files from the current operation. Files can be excluded by two options:
- --exclude PATTERN
- We can specify a wildcard pattern of files to be excluded. For example:
- $ rsync -avz /home/code/some_code /mnt/disk/backup/code --exclude "*.txt"
- This command excludes .txt files from backing up.
- Or, we can specify a list of files to be excluded by providing a list file.
- Use --exclude-from FILEPATH .
- Deleting non-existent files while updating rsync backup
- We archive files as tarball and transfer the tarball to the remote backup location. When we
- need to update the backup data, we create a TAR file again and transfer the file to the backup
- location. By default, rsync does not remove files from the destination if they no longer exist
- at the source. In order to remove the files from the destination that do not exist at the source,
- use the rsync --delete option:
- $ rsync -avz SOURCE DESTINATION --delete
- Scheduling backups at intervals
- You can create a cron job to schedule backups at regular intervals.
- A sample is as follows:
- $ crontab -e
- Add the following line:
- 0 */10 * * * rsync -avz /home/code user@IP_ADDRESS:/home/backups
- The above crontab entry schedules the rsync to be executed every 10 hours.
- www.it-ebooks.info
- Chapter 6
- 227
- */10 is the hour position of the crontab syntax. /10 specifies to execute the backup every
- 10 hours. If */10 is written in the minutes position, it will execute every 10 minutes.
- Have a look at the Scheduling with cron recipe in Chapter 9 to understand how to configure
- crontab .
- Version control based backup with Git
- People use different strategies in backing up data. Differential backups are more efficient
- than making copies of the entire source directory to a target the backup directory with the
- version number using date or time of a day. It causes wastage of space. We only need to
- copy the changes that occurred to files from the second time that the backups occur. This is
- called incremental backups. We can manually create incremental backups using tools like
- rsync . But restoring this sort of backup can be difficult. The best way to maintain and restore
- changes is to use version control systems. They are very much used in software development
- and maintenance of code, since coding frequently undergoes changes. Git (GNU it) is a very
- famous and is the most efficient version control systems available. Let's use Git for backup
- of regular files in non-programming context. Git can be installed by your distro's package
- manager. It was written by Linus Torvalds.
- Getting ready
- Here is the problem statement:
- We have a directory that contains several files and subdirectories. We need to keep track of
- changes occurring to the directory contents and back them up. If data becomes corrupted or
- goes missing, we must be able to restore a previous copy of that data. We need to backup the
- data at regular intervals to a remote machine. We also need to take the backup at different
- locations in the same machine (localhost). Let's see how to implement it using Git.
- How to do it...
- In the directory which is to be backed up use:
- $ cd /home/data/source
- Let it be the directory source to be tracked.
- Set up and initiate the remote backup directory. In the remote machine, create the backup
- destination directory:
- $ mkdir -p /home/backups/backup.git
- $ cd /home/backups/backup.git
- $ git init --bare
- www.it-ebooks.info
- The Backup Plan
- 228
- The following steps are to be performed in the source host machine:
- 1. Add user details to Git in the source host machine:
- $ git config --global user.name "Sarath Lakshman"
- #Set user name to "Sarath Lakshman"
- $ git config --global user.email slynux@slynux.com
- # Set email to slynux@slynux.com
- Initiate the source directory to backup from the host machine. In the source directory in
- the host machine whose files are to be backed up, execute the following commands:
- $ git init
- Initialized empty Git repository in /home/backups/backup.git/
- # Initialize git repository
- $ git commit --allow-empty -am "Init"
- [master (root-commit) b595488] Init
- 2. In the source directory, execute the following command to add the remote git
- directory and synchronize backup:
- $ git remote add origin user@remotehost:/home/backups/backup.git
- $ git push origin master
- Counting objects: 2, done.
- Writing objects: 100% (2/2), 153 bytes, done.
- Total 2 (delta 0), reused 0 (delta 0)
- To user@remotehost:/home/backups/backup.git
- * [new branch] master -> master
- 3. Add or remove files for Git tracking.
- The following command adds all files and folders in the current directory to the
- backup list:
- $ git add *
- We can conditionally add certain files only to the backup list as follows:
- $ git add *.txt
- $ git add *.py
- We can remove the files and folders not required to be tracked by using:
- $ git rm file
- It can be a folder or even a wildcard as follows:
- $ git rm *.txt
- www.it-ebooks.info
- Chapter 6
- 229
- 4. Check-pointing or marking backup points.
- We can mark checkpoints for the backup with a message using the following
- command:
- $ git commit -m "Commit Message"
- We need to update the backup at the remote location at regular intervals. Hence, set
- up a cron job (for example, backing up every five hours).
- Create a file crontab entry with lines:
- 0 */5 * * * /home/data/backup.sh
- Create a script /home/data/backup.sh as follows:
- #!/bin/ bash
- cd /home/data/source
- git add .
- git commit -am "Commit - @ $(date)"
- git push
- Now we have set up the backup system.
- 5. Restoring data with Git.
- In order to view all backup versions use:
- $ git log
- Update the current directory to the last backup by ignoring any recent changes.
- To revert back to any previous state or version, look into the commit ID,
- which is a 32-character hex string. Use the commit ID with git checkout .
- For commit ID 3131f9661ec1739f72c213ec5769bc0abefa85a9 it will be:
- $ git checkout 3131f9661ec1739f72c213ec5769bc0abefa85a9
- $ git commit -am "Restore @ $(date) commit ID:
- 3131f9661ec1739f72c213ec5769bc0abefa85a9"
- $ git push
- In order to view the details about versions again, use:
- $ git log
- If the working directory is broken due to some issues, we need to fix the directory with
- the backup at the remote location.
- Then we can recreate the contents from the backup at the remote location as follows:
- $ git clone user@remotehost:/home/backups/backup.git
- This will create a directory backup with all contents.
- www.it-ebooks.info
- The Backup Plan
- 230
- Cloning hard drive and disks with dd
- While working with hard drives and partitions, we may need to create copies or make backups
- of full partitions rather than copying all contents (not only hard disk partitions but also copy an
- entire hard disk without missing any information, such as boot record, partition table, and so
- on). In this situation we can use the dd command. It can be used to clone any type of disks,
- such as hard disks, flash drives, CDs, DVDs, floppy disks, and so on.
- Getting ready
- The dd command expands to Data Definition. Since its improper usage leads to loss of data,
- it is nicknamed as "Data Destroyer". Be careful while using the order of arguments. Wrong
- arguments can lead to loss of entire data or can become useless. dd is basically a bitstream
- duplicator that writes the entire bit stream from a disk to a file or a file to a disk. Let's see how
- to use dd .
- How to do it...
- The syntax for dd is as follows:
- $ dd if=SOURCE of=TARGET bs=BLOCK_SIZE count=COUNT
- In this command:
- f if stands for input file or input device path
- f of stands for target file or target device path
- f bs stands for block size (usually, it is given in the power of 2, for example, 512, 1024,
- 2048, and so on). COUNT is the number of blocks to be copied (an integer).
- Total bytes copied = BLOCK_SIZE * COUNT
- bs and count are optional.
- By specifying COUNT we can limit the number of bytes to be copied from input file to target. If
- COUNT is not specified, dd will copy from input file until it reaches the end of file (EOF) marker.
- In order to copy a partition into a file use:
- # dd if=/dev/sda1 of=sda1_partition.img
- Here /dev/sda1 is the device path for the partition.
- Restore the partition using the backup as follows:
- # dd if=sda1_partition.img of=/dev/sda1
- You should be careful about the argument if and of . Improper usage may lead to data loss.
- www.it-ebooks.info
- Chapter 6
- 231
- By changing the device path /dev/sda1 to the appropriate device path, any disk can be
- copied or restored.
- In order to permanently delete all of the data in a partition, we can make dd to write zeros into
- the partition by using the following command:
- # dd if=/dev/zero of=/dev/sda1
- /dev/zero is a character device. It always returns infinite zero '\0' characters.
- Clone one hard disk to another hard disk of the same size as follows:
- # dd if=/dev/sda of=/dev/sdb
- Here /dev/sdb is the second hard disk.
- In order to take the image of a CD ROM (ISO file) use:
- # dd if=/dev/cdrom of=cdrom.iso
- There's more...
- When a file system is created in a file which is generated using dd , we can mount it to a
- mount point. Let's see how to work with it.
- Mounting image files
- Any file image created using dd can be mounted using the loopback method. Use the -o
- loop with the mount command.
- # mkdir /mnt/mount_point
- # mount -o loop file.img /mnt/mount_point
- Now we can access the contents of the image files through the location /mnt/mount_point .
- See also
- f Creating ISO files, Hybrid ISO of Chapter 3, explains how to use dd to create an ISO
- file from a CD
- www.it-ebooks.info
- www.it-ebooks.info
- 7
- The Old-boy Network
- In this chapter, we will cover:
- f Basic networking primer
- f Let's ping!
- f Listing all the machines alive on a network
- f Transferring files through network
- f Setting up an Ethernet and wireless LAN with script
- f Password-less auto-login with SSH
- f Running commands on remote host with SSH
- f Mounting remote drive at local mount point
- f Multi-casting window messages on a network
- f Network traffic and port analysis
- Introduction
- Networking is the act of interconnecting machines through a network and configuring the
- nodes in the network with different specifications. We use TCP/IP as our networking stack
- and all operations are based on it. Networks are an important part of every computer system.
- Each node connected in the network is assigned a unique IP address for identification. There
- are many parameters in networking, such as subnet mask, route, ports, DNS, and so on,
- which require a basic understanding to follow.
- www.it-ebooks.info
- The Old-boy Network
- 234
- Several applications that make use of a network operate by opening and connecting to
- firewall ports. Every application may offer services such as data transfer, remote shell login,
- and so on. Several interesting management tasks can be performed on a network consisting
- of many machines. Shell scripts can be used to configure the nodes in a network, test the
- availability of machines, automate execution of commands at remote hosts, and so on. This
- chapter focuses on different recipes that introduce interesting tools or commands related to
- networking and also how they can be used for solving different problems.
- Basic networking primer
- Before digging through recipes based on networking, it is essential for you to have a basic
- understanding of setting up a network, the terminology and commands for assigning an IP
- address, adding routes, and so on. This recipe will give an overview of different commands
- used in GNU/Linux for networking and their usages from the basics.
- Getting ready
- Every node in a network requires many parameters to be assigned to work successfully and
- interconnect with other machines. Some of the different parameters are the IP address,
- subnet mask, gateway, route, DNS, and so on.
- This recipe will introduce commands ifconfig , route , nslookup , and host .
- How to do it...
- Network interfaces are used to connect to a network. Usually, in the context of UNIX-like
- Operating Systems, network interfaces follow the eth0, eth1 naming convention. Also, other
- interfaces, such as usb0, wlan0, and so on, are available for USB network interfaces, wireless
- LAN, and other such networks.
- ifconfig is the command that is used to display details about network interfaces, subnet
- mask, and so on.
- ifconfig is available at /sbin/ifconfig . Some GNU/Linux distributions will display an
- error "command not found" when ifconfig is typed. This is because /sbin in not included
- in the user's PATH environment variable. When a command is typed, the Bash looks in the
- directories specified in PATH variable.
- By default, in Debian, ifconfig is not available since /sbin is not in PATH.
- /sbin/ifconfig is the absolute path, so try run ifconfig with the absolute path (that is,
- /sbin/ifconfig ). For every system, there will be a by default interface 'lo' called loopback
- that points to the current machine. For example:
- $ ifconfig
- lo Link encap:Local Loopback
- www.it-ebooks.info
- Chapter 7
- 235
- inet addr:127.0.0.1 Mask:255.0.0.0
- inet6addr: ::1/128 Scope:Host
- UP LOOPBACK RUNNING MTU:16436 Metric:1
- RX packets:6078 errors:0 dropped:0 overruns:0 frame:0
- TX packets:6078 errors:0 dropped:0 overruns:0 carrier:0
- collisions:0 txqueuelen:0
- RX bytes:634520 (634.5 KB) TX bytes:634520 (634.5 KB)
- wlan0 Link encap:EthernetHWaddr 00:1c:bf:87:25:d2
- inet addr:192.168.0.82 Bcast:192.168.3.255 Mask:255.255.252.0
- inet6addr: fe80::21c:bfff:fe87:25d2/64 Scope:Link
- UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
- RX packets:420917 errors:0 dropped:0 overruns:0 frame:0
- TX packets:86820 errors:0 dropped:0 overruns:0 carrier:0
- collisions:0 txqueuelen:1000
- RX bytes:98027420 (98.0 MB) TX bytes:22602672 (22.6 MB)
- The left-most column in the ifconfig output lists the name of network interfaces and the
- right-hand columns show the details related to the corresponding network interface.
- There's more...
- There are several additional commands that frequently come under usage for querying and
- configuring the network. Let's go through the essential commands and usage.
- Printing the list of network interfaces
- Here is a one-liner command sequence to print the list of network interface available
- on a system.
- $ ifconfig | cut -c-10 | tr -d ' ' | tr -s '\n'
- lo
- wlan0
- The first 10 characters of each line in the ifconfig output is reserved for writing the
- name of the network interface. Hence we use cut to extract the first 10 characters of each
- line. tr -d ' ' deletes every space character in each line. Now the \n newline character is
- squeezed using tr -s '\n' to produce a list of interface names.
- www.it-ebooks.info
- The Old-boy Network
- 236
- Assigning and displaying IP addresses
- The ifconfig command displays details of every network interface available on the system.
- However, we can restrict it to a specific interface by using:
- $ ifconfig iface_name
- For example:
- $ ifconfig wlan0
- wlan0 Link encap:Ethernet HWaddr 00:1c:bf:87:25:d2
- inet addr:192.168.0.82 Bcast:192.168.3.255
- Mask:255.255.252.0
- From the outputs of the previously mentioned command, our interests lie in the IP address,
- broadcast address, hardware address, and subnet mask. They are as follows:
- f HWaddr 00:1c:bf:87:25:d2 is the hardware address (MAC address)
- f inet addr:192.168.0.82 is the IP address
- f Bcast:192.168.3.255 is the broadcast address
- f Mask:255.255.252.0 is the subnet mask
- In several scripting contexts, we may need to extract any of these addresses from the script
- for further manipulations.
- Extracting the IP address is a common task. In order to extract the IP address from the
- ifconfig output use:
- $ ifconfig wlan0 | egrep -o "inet addr:[^ ]*" | grep -o "[0-9.]*"
- 192.168.0.82
- Here the first command egrep -o "inet addr:[^ ]*" will print inet
- addr:192.168.0.82 .
- The pattern starts with inet addr: and ends with some non-space character sequence
- (specified by [^ ]* ). Now in the next pipe, it prints the character combination of digits and '.'.
- In order to set the IP address for a network interface, use:
- # ifconfig wlan0 192.168.0.80
- You will need to run the above command as root. 192.168.0.80 is the address to be set.
- Set the subnet mask along with IP address as follows:
- # ifconfig wlan0 192.168.0.80 netmask 255.255.252.0
- www.it-ebooks.info
- Chapter 7
- 237
- Spoofing Hardware Address (MAC Address)
- In certain circumstances where authentication or filtering of computers on a network is
- provided by using the hardware address, we can use hardware address spoofing. The
- hardware address appears in ifconfig output as HWaddr 00:1c:bf:87:25:d2 .
- We can spoof the hardware address at the software level as follows:
- # ifconfig eth0 hw ether 00:1c:bf:87:25:d5
- In the above command, 00:1c:bf:87:25:d5 is the new MAC address to be assigned.
- This can be useful when we need to access the Internet through MAC authenticated service
- providers that provide access to the Internet for a single machine.
- Name server and DNS (Domain Name Service)
- The elementary addressing scheme for the Internet is IP addresses (dotted decimal form, for
- example, 202.11.32.75 ). However, the resources on the Internet (for example, websites)
- are accessed through a combination of ASCII characters called URLs or domain names. For
- example, google.com is a domain name. It actually corresponds to an IP address. Typing the
- IP address in the browser can also access the URL www.google.com .
- This technique of abstracting IP addresses with symbolic names is called Domain Name Service
- (DNS). When we enter google.com , the DNS servers configured with our network resolve the
- domain name into the corresponding IP address. While on a local network, we setup the local
- DNS for naming local machines on the network symbolically using their hostnames.
- Name servers assigned to the current system can be viewed by reading /etc/resolv.conf .
- For example:
- $ cat /etc/resolv.conf
- nameserver 8.8.8.8
- We can add name servers manually as follows:
- # echo nameserver IP_ADDRESS >> /etc/resolv.conf
- How can we obtain the IP address for a corresponding domain name?
- The easiest method to obtain an IP address is by trying to ping the given domain name and
- looking at the echo reply. For example:
- $ ping google.com
- PING google.com (64.233.181.106) 56(84) bytes of data.
- Here 64.233.181.106 is the corresponding IP address.
- A domain name can have multiple IP addresses assigned. In that case, the DNS server will
- return one address among the list of IP addresses. To obtain all the addresses assigned to
- the domain name, we should use a DNS lookup utility.
- www.it-ebooks.info
- The Old-boy Network
- 238
- DNS lookup
- There are different DNS lookup utilities available from the command line. These will request a
- DNS server for an IP address resolution. host and nslookup are two DNS lookup utilities.
- When host is executed it will list out all of the IP addressed attached to the domain name.
- nslookup is another command that is similar to host , which can be used to query details
- related to DNS and resolving of names. For example:
- $ host google.com
- google.com has address 64.233.181.105
- google.com has address 64.233.181.99
- google.com has address 64.233.181.147
- google.com has address 64.233.181.106
- google.com has address 64.233.181.103
- google.com has address 64.233.181.104
- It may also list out DNS resource records like MX (Mail Exchanger) as follows:
- $ nslookup google.com
- Server: 8.8.8.8
- Address: 8.8.8.8#53
- Non-authoritative answer:
- Name: google.com
- Address: 64.233.181.105
- Name: google.com
- Address: 64.233.181.99
- Name: google.com
- Address: 64.233.181.147
- Name: google.com
- Address: 64.233.181.106
- Name: google.com
- Address: 64.233.181.103
- Name: google.com
- Address: 64.233.181.104
- Server: 8.8.8.8
- The last line above corresponds to the default nameserver used for DNS resolution.
- www.it-ebooks.info
- Chapter 7
- 239
- Without using the DNS server, it is possible to add a symbolic name to IP address resolution
- just by adding entries into file /etc/hosts .
- In order to add an entry, use the following syntax:
- # echo IP_ADDRESS symbolic_name >> /etc/hosts
- For example:
- # echo 192.168.0.9 backupserver.com >> /etc/hosts
- After adding this entry, whenever a resolution to backupserver.com occurs, it will resolve
- to 192.168.0.9 .
- Setting default gateway, showing routing table information
- When a local network is connected to another network, it needs to assign some machine
- or network node through which an interconnection takes place. Hence the IP packets with
- a destination exterior to the local network should be forwarded to the node machine, which
- is interconnected to the external network. This special node machine, which is capable of
- forwarding packets to the external network, is called a gateway. We set the gateway for every
- node to make it possible to connect to an external network.
- The operating system maintains a table called the routing table, which contains information
- on how packets are to be forwarded and through which machine node in the network. The
- routing table can be displayed as follows:
- $ route
- Kernel IP routing table
- Destination Gateway Genmask Flags Metric Ref UseIface
- 192.168.0.0 * 255.255.252.0 U 2 0 0wlan0
- link-local * 255.255.0.0 U 1000 0 0wlan0
- default p4.local 0.0.0.0 UG 0 0 0wlan0
- Or, you can also use:
- $ route -n
- Kernel IP routing table
- Destination Gateway Genmask Flags Metric Ref Use Iface
- 192.168.0.0 0.0.0.0 255.255.252.0 U 2 0 0 wlan0
- 169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 wlan0
- 0.0.0.0 192.168.0.4 0.0.0.0 UG 0 0 0 wlan0
- Using -n specifies to display the numerical addresses. When -n is used it will display every
- entry with a numerical IP addresses, else it will show symbolic host names instead of IP
- addresses under the DNS entries for IP addresses that are available.
- www.it-ebooks.info
- The Old-boy Network
- 240
- A default gateway is set as follows:
- # route add default gw IP_ADDRESS INTERFACE_NAME
- For example:
- # route add default gw 192.168.0.1 wlan0
- Traceroute
- When an application requests a service through the Internet, the server may be at a distant
- location and connected through any number of gateways or device nodes. The packets
- travel through several gateways and reach the destination. There is an interesting command
- traceroute that displays the address of all intermediate gateways through which the
- packet travelled to reach the destination. traceroute information helps us to understand
- how many hops each packet should take in order reach the destination. The number of
- intermediate gateways or routers gives a metric to measure the distance between two nodes
- connected in a large network. An example of the output from traceroute is as follows:
- $ traceroute google.com
- traceroute to google.com (74.125.77.104), 30 hops max, 60 byte packets
- 1 gw-c6509.lxb.as5577.net (195.26.4.1) 0.313 ms 0.371 ms 0.457 ms
- 2 40g.lxb-fra.as5577.net (83.243.12.2) 4.684 ms 4.754 ms 4.823 ms
- 3 de-cix10.net.google.com (80.81.192.108) 5.312 ms 5.348 ms 5.327 ms
- 4 209.85.255.170 (209.85.255.170) 5.816 ms 5.791 ms 209.85.255.172
- (209.85.255.172) 5.678 ms
- 5 209.85.250.140 (209.85.250.140) 10.126 ms 9.867 ms 10.754 ms
- 6 64.233.175.246 (64.233.175.246) 12.940 ms 72.14.233.114
- (72.14.233.114) 13.736 ms 13.803 ms
- 7 72.14.239.199 (72.14.239.199) 14.618 ms 209.85.255.166
- (209.85.255.166) 12.755 ms 209.85.255.143 (209.85.255.143) 13.803 ms
- 8 209.85.255.98 (209.85.255.98) 22.625 ms 209.85.255.110
- (209.85.255.110) 14.122 ms
- *
- 9 ew-in-f104.1e100.net (74.125.77.104) 13.061 ms 13.256 ms 13.484 ms
- See also
- f Playing with variables and environment variables of Chapter 1, explains the PATH
- variable
- f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
- command
- www.it-ebooks.info
- Chapter 7
- 241
- Let's ping!
- ping is the most basic network command, and one that every user should first know. It is a
- universal command that is available on major Operating Systems. It is also a diagnostic tool
- used for verifying the connectivity between two hosts on a network. It can be used to find out
- which machines are alive on a network. Let us see how to use ping.
- How to do it...
- In order to check the connectivity of two hosts on a network, the ping command uses
- Internet Control Message Protocol (ICMP) echo packets. When these echo packets are sent
- towards a host, the host responds back with a reply if it is reachable or alive.
- Check whether a host is reachable as follows:
- $ ping ADDRESS
- The ADDRESS can be a hostname, domain name, or an IP address itself.
- ping will continuously send packets and the reply information is printed on the terminal. Stop
- the pinging by pressing Ctrl + C .
- For example:
- f When the host is reachable the output will be similar to the following:
- $ ping 192.168.0.1
- PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
- 64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=1.44 ms
- ^C
- --- 192.168.0.1 ping statistics ---
- 1 packets transmitted, 1 received, 0% packet loss, time 0ms
- rtt min/avg/max/mdev = 1.440/1.440/1.440/0.000 ms
- $ ping google.com
- PING google.com (209.85.153.104) 56(84) bytes of data.
- 64 bytes from bom01s01-in-f104.1e100.net (209.85.153.104): icmp_
- seq=1 ttl=53 time=123 ms
- ^C
- --- google.com ping statistics ---
- 1 packets transmitted, 1 received, 0% packet loss, time 0ms
- rtt min/avg/max/mdev = 123.388/123.388/123.388/0.000 ms
- www.it-ebooks.info
- The Old-boy Network
- 242
- f When a host is unreachable the output will be similar to:
- $ ping 192.168.0.99
- PING 192.168.0.99 (192.168.0.99) 56(84) bytes of data.
- From 192.168.0.82 icmp_seq=1 Destination Host Unreachable
- From 192.168.0.82 icmp_seq=2 Destination Host Unreachable
- Once the host is not reachable, the ping returns a Destination Host Unreachable
- error message.
- There's more
- In addition to checking the connectivity between two points in a network, the ping command
- can be used with additional options to get useful information. Let's go through the additional
- options of ping .
- Round trip time
- The ping command can be used to find out the Round Trip Time (RTT) between two hosts on a
- network. RTT is the time required for the packet to reach the destination host and come back to
- the source host. The RTT in milliseconds can be obtained from ping. An example is as follows:
- --- google.com ping statistics ---
- 5 packets transmitted, 5 received, 0% packet loss, time 4000ms
- rtt min/avg/max/mdev = 118.012/206.630/347.186/77.713 ms
- Here the minimum RTT is 118.012ms, the average RTT is 206.630ms, and the maximum RTT is
- 347.186ms. The mdev (77.713ms) parameter in the ping output stands for mean deviation.
- Limiting number of packets to be sent
- The ping command sends echo packets and waits for the reply of echo indefinitely until it is
- stopped by pressing Ctrl + C . However, we can limit the count of echo packets to be sent by
- using the -c flag.
- The usage is as follows:
- -c COUNT
- For example:
- $ ping 192.168.0.1 -c 2
- PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
- 64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=4.02 ms
- 64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=1.03 ms
- www.it-ebooks.info
- Chapter 7
- 243
- --- 192.168.0.1 ping statistics ---
- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms
- rtt min/avg/max/mdev = 1.039/2.533/4.028/1.495 ms
- In the previous example, the ping command sends two echo packets and stops.
- This is useful when we need to ping multiple machines from a list of IP addresses through a
- script and checks its statuses.
- Return status of ping command
- The ping command returns exit status 0 when it succeeds and returns non-zero when it
- fails. Successful means, destination host is reachable, where failure is when destination host
- is unreachable.
- The return status can be easily obtained as follows:
- $ ping ADDRESS -c2
- if [ $? -eq 0 ];
- then
- echo Successful ;
- else
- echo Failure
- fi
- Listing all the machines alive on a network
- When we deal with a large local area network, we may need to check the availability of other
- machines in the network, whether alive or not. A machine may not be alive in two conditions:
- either it is not powered on or due to a problem in the network. By using shell scripting, we can
- easily find out and report which machines are alive on the network. Let's see how to do it.
- Getting ready
- In this recipe, we use two methods. The first method uses ping and the second method uses
- fping . fping doesn't come with a Linux distribution by default. You may have to manually
- install fping using a package manager.
- How to do it...
- Let's go through the script to find out all the live machines on the network and alternate
- methods to find out the same.
- www.it-ebooks.info
- The Old-boy Network
- 244
- f Method 1:
- We can write our own script using the ping command to query list of IP addresses
- and check whether they are alive or not as follows:
- #!/bin/bash
- #Filename: ping.sh
- # Change base address 192.168.0 according to your network.
- for ip in 192.168.0.{1..255} ;
- do
- ping $ip -c 2 &> /dev/null ;
- if [ $? -eq 0 ];
- then
- echo $ip is alive
- fi
- done
- The output is as follows:
- $ ./ping.sh
- 192.168.0.1 is alive
- 192.168.0.90 is alive
- f Method 2:
- We can use an existing command-line utility to query the status of machines on a
- network as follows:
- $ fping -a 192.160.1/24 -g 2> /dev/null
- 192.168.0.1
- 192.168.0.90
- Or, use:
- $ fping -a 192.168.0.1 192.168.0.255 -g
- How it works...
- In Method 1, we used the ping command to find out the alive machines on the network.
- We used a for loop for iterating through the list of IP addresses. The list is generated as
- 192.168.0.{1..255} . The {start..end} notation will expand and will generate a list of
- IP addresses, such as 192.168.0.1 , 192.168.0.2 , 192.168.0.3 till 192.168.0.255 .
- www.it-ebooks.info
- Chapter 7
- 245
- ping $ip -c 2 &> /dev/null will run a ping to the corresponding IP address in each
- execution of loop. -c 2 is used to restrict the number of echo packets to be sent to two
- packets. &> /dev/null is used to redirect both stderr and stdout to /dev/null so that
- it won't be printed on the terminal. Using $? we evaluate the exit status. If it is successful, the
- exit status is 0 else non-zero. Hence the successful IP addresses are printed. We can also
- print the list of unsuccessful IP addresses to give the list of unreachable IP addresses.
- Here is an exercise for you. Instead of using a range of IP
- addresses hard-coded in the script, modify the script to
- read a list of IP addresses from a file or stdin.
- In this script, each ping is executed one after the other. Even though all the IP addresses
- are independent each other, the ping command is executed due to a sequential program, it
- takes a delay of sending two echo packets and receiving them or the timeout for a reply for
- executing the next ping command.
- When it comes to 255 addresses, the delay is large. Let's run all the ping commands in
- parallel to make it much faster. The core part of the script is the loop body. To make the ping
- commands in parallel, enclose the loop body in ( )& . ( ) encloses a block of commands
- to run as the sub-shell and & sends it to the background by leaving the current thread. For
- example:
- (
- ping $ip -c2 &> /dev/null ;
- if [ $? -eq 0 ];
- then
- echo $ip is alive
- fi
- )&
- wait
- The for loop body executes many background process and it comes out of the loop and it
- terminates the script. In order to present the script to terminate until all its child process end,
- we have a command called wait . Place a wait at the end of the script so that it waits for the
- time until all the child ( ) subshell processes complete.
- The wait command enables a script to be terminated only after all its child
- process or background processes terminate or complete.
- Have a look at fast_ping.sh from the code provided with the book.
- www.it-ebooks.info
- The Old-boy Network
- 246
- Method 2 uses a different command called fping . It can ping a list of IP addresses
- simultaneously and respond very quickly. The options available with fping are as follows:
- f The -a option with fping specifies to print all alive machine's IP addresses
- f The -u option with fping specifies to print all unreachable machines
- f The -g option specifies to generate a range of IP addresses from slash-subnet mask
- notation specified as IP/mask or start and end IP addresses as:
- $ fping -a 192.160.1/24 -g
- Or
- $ fping -a 192.160.1 192.168.0.255 -g
- f 2>/dev/null is used to dump error messages printed due to unreachable host to a
- null device
- It is also possible to manually specify a list of IP addresses as command-line arguments or as
- a list through stdin . For example:
- $ fping -a 192.168.0.1 192.168.0.5 192.168.0.6
- # Passes IP address as arguments
- $ fping -a <ip.list
- # Passes a list of IP addresses from a file
- There's more...
- The fping command can be used for querying DNS data from a network. Let's see how to do it.
- DNS lookup with fping
- fping has an option -d that returns host names by using DNS lookup for each echo reply. It
- will print out host names rather than IP addresses on ping replies.
- $ cat ip.list
- 192.168.0.86
- 192.168.0.9
- 192.168.0.6
- $ fping -a -d 2>/dev/null <ip.list
- www.local
- dnss.local
- www.it-ebooks.info
- Chapter 7
- 247
- See also
- f Playing with file descriptors and redirection of Chapter 1, explains the data
- redirection
- f Comparisons and tests of Chapter 1, explains numeric comparisons
- Transferring files
- The major purpose of the networking of computers is for resource sharing. Among resource
- sharing, the most prominent use is in file sharing. There are different methods by which we
- can transfer files between different nodes on a network. This recipe discusses how to transfer
- files using commonly used protocols FTP, SFTP, RSYNC, and SCP.
- Getting ready
- The commands for performing file transfer over the network are mostly available by default
- with Linux installations. Files via FTP can be transferred by using the lftp command. Files via
- a SSH connection can be transferred by using sftp , RSYNC using SSH with rsync command
- and transfer through SSH using scp .
- How to do it...
- File Transfer Protocol (FTP) is an old file transfer protocol for transferring files between
- machines on a network. We can use the command lftp for accessing FTP enabled servers
- for file transfer. It uses Port 21. FTP can only be used if an FTP server is installed on the
- remote machine. FTP is used by many public websites to share files.
- To connect to an FTP server and transfer files in between, use:
- $ lftp username@ftphost
- Now it will prompt for a password and then display a logged in prompt as follows:
- lftp username@ftphost:~>
- You can type commands in this prompt. For example:
- f To change to a directory, use cd directory
- f To change directory of local machine, use lcd
- f To create a directory use mkdir
- f To download a file, use get filename as follows:
- lftp username@ftphost:~> get filename
- www.it-ebooks.info
- The Old-boy Network
- 248
- f To upload a file from the current directory, use put filename as follows:
- lftp username@ftphost:~> put filename
- f An lftp session can be exited by using the quit command
- Auto completion is supported in the lftp prompt.
- There's more...
- Let's go through some additional techniques and commands used for file transfer through a
- network.
- Automated FTP transfer
- ftp is another command used for FTP-based file transfer. lftp is more flexible for usage.
- lftp and the ftp command open an interactive session with user (it prompts for user input
- by displaying messages). What if we want to automate a file transfer instead of using the
- interactive mode? We can automate FTP file transfers by writing a shell script as follows:
- #!/bin/bash
- #Filename: ftp.sh
- #Automated FTP transfer
- HOST='domain.com'
- USER='foo'
- PASSWD='password'
- ftp -i -n $HOST <<EOF
- user ${USER} ${PASSWD}
- binary
- cd /home/slynux
- puttestfile.jpg
- getserverfile.jpg
- quit
- EOF
- The above script has the following structure:
- <<EOF
- DATA
- EOF
- This is used to send data through stdin to the FTP command. The recipe, Playing with file
- descriptors and redirection in Chapter 1, explains various methods for redirection into stdin .
- The -i option of ftp turns off the interactive session with user. user ${USER} ${PASSWD}
- sets the username and password. binary sets the file mode to binary.
- www.it-ebooks.info
- Chapter 7
- 249
- SFTP (Secure FTP)
- SFTP is an FTP-like file transfer system that runs on top of an SSH connection. It makes use of
- an SSH connection to emulate an FTP interface. It doesn't require an FTP server at the remote
- end to perform file transfer but it requires an OpenSSH server to be installed and running. It is
- an interactive command, which offers an sftp prompt.
- The following commands are used to perform the file transfer. All other commands remain
- same for every automated FTP session with specific HOST, USER, and PASSWD:
- cd /home/slynux
- put testfile.jpg
- get serverfile.jpg
- In order to run sftp , use:
- $ sftp user@domainname
- Similar to lftp , an sftp session can be exited by typing the quit command.
- The SSH server sometimes will not be running at the default Port 22. If it is running at a
- different port, we can specify the port along with sftp as -oPort=PORTNO .
- For example:
- $ sftp -oPort=422 user@slynux.org
- -oPort should be the first argument of the sftp command.
- RSYNC
- rsync is an important command-line utility that is widely used for copying files over networks
- and for taking backup snapshots. This is better explained in separate recipe,
- Backup snapshots with rsync, that explains the usage of rsync .
- SCP (Secure Copy)
- SCP is a file copy technique which is more secure than the traditional remote copy tool called
- rcp . The files are transferred through an encrypted channel. SSH is used as an encryption
- channel. We can easily transfer files to a remote machine as follows:
- $ scp filename user@remotehost:/home/path
- This will prompt for a password. It can be made password less by using autologin SSH
- technique. The recipe, Password-less auto-login with SSH, explains SSH autologin.
- Therefore, file transfer using scp doesn't require specific scripting. Once SSH login is automated,
- the scp command can be executed without an interactive prompt for the password.
- www.it-ebooks.info
- The Old-boy Network
- 250
- Here remotehost can be IP address or domain name. The format of the scp command is:
- $ scp SOURCE DESTINATION
- SOURCE or DESTINATION can be in the format username@localhost:/path for example:
- $ scp user@remotehost:/home/path/filename filename
- The above command copies a file from the remote host to the current directory with the given
- filename.
- If SSH is running at a different port than 22, use -oPort with the same syntax as sftp .
- Recursive copying with SCP
- By using scp we can recursively copy a directory between two machines on a network as
- follows with the -r parameter:
- $ scp -r /home/slynux user@remotehost:/home/backups
- # Copies the directory /home/slynux recursively to remote location
- scp can also copy files by preserving permissions and mode by using the -p parameter.
- See also
- f Playing with file descriptors and redirection of Chapter 1, explains the standard input
- using EOF
- Setting up an Ethernet and wireless LAN
- with script
- An Ethernet is simple to configure. Since it uses physical cables, there are no special
- requirements such as authentication. However, a wireless LAN requires authentication—for
- example, a WEP key as well as the ESSID of the wireless network to connect. Let's see how to
- connect to a wireless as well as a wired network by writing a shell script.
- Getting ready
- To connect to a wired network, we need to assign an IP address and subnet mask by using the
- ifconfig utility. But for a wireless network connection, it will require additional utilities, such
- as iwconfig and iwlist , to configure more parameters.
- www.it-ebooks.info
- Chapter 7
- 251
- How to do it...
- In order to connect to a network from a wired interface, execute the following script:
- #!/bin/bash
- #Filename: etherconnect.sh
- #Description: Connect Ethernet
- #Modify the parameters below according to your settings
- ######### PARAMETERS ###########
- IFACE=eth0
- IP_ADDR=192.168.0.5
- SUBNET_MASK=255.255.255.0
- GW=192.168.0.1
- HW_ADDR='00:1c:bf:87:25:d2'
- # HW_ADDR is optional
- #################################
- if [ $UID -ne 0 ];
- then
- echo "Run as root"
- exit 1
- fi
- # Turn the interface down before setting new config
- /sbin/ifconfig $IFACE down
- if [[ -n $HW_ADDR ]];
- then
- /sbin/ifconfig hw ether $HW_ADDR
- echo Spoofed MAC ADDRESS to $HW_ADDR
- fi
- /sbin/ifconfig $IFACE $IP_ADDR netmask $SUBNET_MASK
- route add default gw $GW $IFACE
- echo Successfully configured $IFACE
- The script for connecting to a wireless LAN with WEP is as follows:
- #!/bin/bash
- #Filename: wlan_connect.sh
- #Description: Connect to Wireless LAN
- #Modify the parameters below according to your settings
- ######### PARAMETERS ###########
- IFACE=wlan0
- IP_ADDR=192.168.1.5
- SUBNET_MASK=255.255.255.0
- www.it-ebooks.info
- The Old-boy Network
- 252
- GW=192.168.1.1
- HW_ADDR='00:1c:bf:87:25:d2'
- #Comment above line if you don't want to spoof mac address
- ESSID="homenet"
- WEP_KEY=8b140b20e7
- FREQ=2.462G
- #################################
- KEY_PART=""
- if [[ -n $WEP_KEY ]];
- then
- KEY_PART="key $WEP_KEY"
- fi
- # Turn the interface down before setting new config
- /sbin/ifconfig $IFACE down
- if [ $UID -ne 0 ];
- then
- echo "Run as root"
- exit 1;
- fi
- if [[ -n $HW_ADDR ]];
- then
- /sbin/ifconfig $IFACE hw ether $HW_ADDR
- echo Spoofed MAC ADDRESS to $HW_ADDR
- fi
- /sbin/iwconfig $IFACE essid $ESSID $KEY_PART freq $FREQ
- /sbin/ifconfig $IFACE $IP_ADDR netmask $SUBNET_MASK
- route add default gw $GW $IFACE
- echo Successfully configured $IFACE
- How it works...
- The commands ifconfig , iwconfig , and route are to be run as root. Hence a check for
- the root user is performed at the beginning of the scripts.
- The Ethernet connection script is pretty straightforward and it uses the concepts explained in
- the recipe, Basic networking primer. Let's go through the commands used for connecting to
- the wireless LAN.
- www.it-ebooks.info
- Chapter 7
- 253
- A wireless LAN requires some parameters such as the essid , key , and frequency to connect
- to the network. The essid is the name of the wireless network to which we need to connect.
- Some Wired Equivalent Protocol (WEP) networks use a WEP key for authentication, whereas
- some networks don't. The WEP key is usually a 10-letter hex passphrase. Next comes the
- frequency assigned to the network. iwconfig is the command used to attach the wireless
- card with the proper wireless network, WEP key, and frequency.
- We can scan and list the available wireless network by using the utility iwlist . To scan, use
- the following command:
- # iwlist scan
- wlan0 Scan completed :
- Cell 01 - Address: 00:12:17:7B:1C:65
- Channel:11
- Frequency:2.462 GHz (Channel 11)
- Quality=33/70 Signal level=-77 dBm
- Encryption key:on
- ESSID:"model-2"
- The Frequency parameter can be extracted from the scan result, from the line
- Frequency:2.462 GHz (Channel 11) .
- See also
- f Comparisons and tests of Chapter 1, explains string comparisons.
- Password-less auto-login with SSH
- SSH is widely used with automation scripting. By using SSH, it is possible to remotely execute
- commands at remote hosts and read their output. SSH is authenticated by using username
- and password. Passwords are prompted during the execution of SSH commands. But in
- automation scripts, SSH commands may be executed hundreds of times in a loop and hence
- providing passwords each time is impractical. Hence we need to automate logins. SSH has
- a built-in feature by which SSH can auto-login using SSH keys. This recipe describes how to
- create SSH keys and facilitate auto-login.
- www.it-ebooks.info
- The Old-boy Network
- 254
- How to do it...
- The SSH uses public key-based and private key-based encryption techniques for automatic
- authentication. An authentication key has two elements: a public key and a private key pair.
- We can create an authentication key using the ssh-keygen command. For automating the
- authentication, the public key must be placed at the server (by appending the public key to the
- ~/.ssh/authorized_keys file) and its private key file of the pair should be present at the
- ~/.ssh directory of the user at client machine, which is the computer you are logging in from.
- Several configurations (for example, path and name of the authorized_keys file) regarding
- the SSH can be configured by altering the configuration file /etc/ssh/sshd_config .
- There are two steps towards the setup of automatic authentication with SSH. They are:
- 1. Creating the SSH key from the machine, which requires a login to a remote machine.
- 2. Transferring the public key generated to the remote host and appending it to
- ~/.ssh/authorized_keys file.
- In order to create an SSH key, enter the ssh-keygen command with the encryption algorithm
- type specified as RSA as follows:
- $ ssh-keygen -t rsa
- Generating public/private rsa key pair.
- Enter file in which to save the key (/home/slynux/.ssh/id_rsa):
- Created directory '/home/slynux/.ssh'.
- Enter passphrase (empty for no passphrase):
- Enter same passphrase again:
- Your identification has been saved in /home/slynux/.ssh/id_rsa.
- Your public key has been saved in /home/slynux/.ssh/id_rsa.pub.
- The key fingerprint is:
- f7:17:c6:4d:c9:ee:17:00:af:0f:b3:27:a6:9c:0a:05slynux@slynux-laptop
- The key's randomart image is:
- +--[ RSA 2048]----+
- | . |
- | o . .|
- | E o o.|
- | ...oo |
- | .S .+ +o.|
- | . . .=....|
- | .+.o...|
- | . . + o. .|
- | ..+ |
- +-----------------+
- www.it-ebooks.info
- Chapter 7
- 255
- You need to enter a passphrase for generating the public-private key pair. It is also possible
- to generate the key pair without entering a passphrase, but it is insecure. We can write
- monitoring scripts that use automated login from the script to several machines. In such
- cases, you should leave the passphrase empty while running the ssh-keygen command to
- prevent the script from asking for a passphrase while running.
- Now ~/.ssh/id_rsa.pub and ~/.ssh/id_rsa has been generated. id_dsa.pub is the
- generated public key and id_dsa is the private key. The public key has to be appended to the
- ~/.ssh/authorized_keys file on remote servers where we need to auto-login from the
- current host.
- In order to append a key file, use:
- $ ssh USER@REMOTE_HOST "cat >> ~/.ssh/authorized_keys" < ~/.ssh/id_rsa.
- pub
- Password:
- Provide the login password in the previous command.
- The auto-login has been set up. From now on, SSH will not prompt for passwords during
- execution. You can test this with the following command:
- $ ssh USER@REMOTE_HOST uname
- Linux
- You will not be prompted for a password.
- Running commands on remote host
- with SSH
- SSH is an interesting system administration tool that enables to control remote hosts by login
- with a shell. SSH stands for Secure Shell. Commands can be executed on the shell received
- by login to remote host as if we run commands on localhost. It runs the network data transfer
- over an encrypted tunnel. This recipe will introduce different ways in which commands can be
- executed on the remote host.
- Getting ready
- SSH doesn't come by default with all GNU/Linux distributions. Therefore, you may have to
- install the openssh-server and openssh-client packages using a package manager.
- SSH service runs by default on port number 22.
- www.it-ebooks.info
- The Old-boy Network
- 256
- How to do it...
- To connect to a remote host with the SSH server running, use:
- $ ssh username@remote_host
- In this command:
- f username is the user that exist at the remote host.
- f remote_host can be domain name or IP address.
- For example:
- $ ssh mec@192.168.0.1
- The authenticity of host '192.168.0.1 (192.168.0.1)' can't be
- established.
- RSA key fingerprint is 2b:b4:90:79:49:0a:f1:b3:8a:db:9f:73:2d:75:d6:f9.
- Are you sure you want to continue connecting (yes/no)? yes
- Warning: Permanently added '192.168.0.1' (RSA) to the list of known
- hosts.
- Password:
- Last login: Fri Sep 3 05:15:21 2010 from 192.168.0.82
- mec@proxy-1:~$
- It will interactively ask for a user password and upon successful authentication it will return
- the shell for the user.
- By default, the SSH server runs at Port 22. But certain servers run the SSH service at different
- ports. In that case use -p port_no with the ssh command to specify the port.
- In order to connect to an SSH server running at port 422, use:
- $ ssh user@locahost -p 422
- You can execute commands in the shell that corresponds to the remote host. Shell is an
- interactive tool in which a user types and runs commands. However, in shell scripting contexts,
- we do not need an interactive shell. We need to automate several tasks. We require to execute
- several commands at the remote shell and display or store its output at localhost. Issuing a
- password every time is not practical for an automated script, hence autologin for SSH should
- be configured.
- The recipe, Password-less auto-login with SSH, explains the SSH commands.
- Make sure that auto-login is configured before running automated scripts that use SSH.
- www.it-ebooks.info
- Chapter 7
- 257
- To run a command on the remote host and display its output on the localhost shell, use the
- following syntax:
- $ ssh user@host 'COMMANDS'
- For example:
- $ ssh mec@192.168.0.1 'whoami'
- Password:
- mec
- Multiple commands can be given by using semicolon delimiter in between the commands as:
- $ ssh user@host 'command1 ; command2 ; command3'
- Commands can be sent through stdin and the output of the commands will be available to
- stdout .
- The syntax will be as follows:
- $ ssh user@remote_host "COMMANDS" > stdout.txt 2> errors.txt
- The COMMANDS string should be quoted in order to prevent a semicolon character to act as
- delimiter in the localhost shell. We can also pass any command sequence that involves piped
- statements to the SSH command through stdin as follows:
- $ echo "COMMANDS" | sshuser@remote_host> stdout.txt 2> errors.txt
- For example:
- $ ssh mec@192.168.0.1 "echo user: $(whoami);echo OS: $(uname)"
- Password:
- user: slynux
- OS: Linux
- In this example, the commands executed on the remote host are:
- echo user: $(whoami);
- echo OS: $(uname)
- It can be generalized as:
- COMMANDS="command1; command2; command3"
- $ ssh user@hostname "$COMMANDS"
- We can also pass a more complex subshell in the command sequence by using the ( )
- subshell operator.
- www.it-ebooks.info
- The Old-boy Network
- 258
- Let's write an SSH based shell script that collects the uptime of a list of remote hosts. Uptime
- is the time for which the system is powered on. The uptime command is used to display how
- long the system has been powered on.
- It is assumed that all systems in the IP_LIST have a common user test .
- #!/bin/bash
- #Filename: uptime.sh
- #Description: Uptime monitor
- IP_LIST="192.168.0.1 192.168.0.5 192.168.0.9"
- USER="test"
- for IP in $IP_LIST;
- do
- utime=$(ssh $USER@$IP uptime | awk '{ print $3 }' )
- echo $IP uptime: $utime
- done
- The expected output is:
- $ ./uptime.sh
- 192.168.0.1 uptime: 1:50,
- 192.168.0.5 uptime: 2:15,
- 192.168.0.9 uptime: 10:15,
- There's more...
- The ssh command can be executed with several additional options. Let's go through them.
- SSH with compression
- The SSH protocol also supports data transfer with compression, which comes in handy when
- bandwidth is an issue. Use the -C option with the ssh command to enable compression as
- follows:
- $ ssh -C user@hostname COMMANDS
- Redirecting data into stdin of remote host shell commands
- Sometimes we need to redirect some data into stdin of remote shell commands. Let's see
- how to do it. An example is as follows:
- $ echo "text" | ssh user@remote_host 'cat >> list'
- www.it-ebooks.info
- Chapter 7
- 259
- Or:
- # Redirect data from file as:
- $ ssh user@remote_host 'cat >> list' < file
- cat >> list appends the data received through stdin to the file list. Here this command
- is executed at the remote host. But the data is passed to stdin from localhost.
- See also
- f Password-less auto-login with SSH, explains how to configure auto-login to execute
- commands without prompting for password.
- Mounting a remote drive at a local mount
- point
- Having a local mount point to access remote host file-system would be really helpful while
- carrying out both read and write data transfer operations. SSH is the most common transfer
- protocol available in a network and hence we can make use of it with sshfs . sshfs enables
- you to mount a remote filesystem to a local mount point. Let's see how to do it.
- Getting ready
- sshfs doesn't come by default with GNU/Linux distributions. Install sshfs by using a
- package manager. sshfs is an extension to the fuse file system package that allows
- supported OSes to mount a wide variety of data as if it were a local file system.
- How to do it...
- In order to mount a filesytem location at a remote host to a local mount point, use:
- # sshfs user@remotehost:/home/path /mnt/mountpoint
- Password:
- Issue the user password when prompted.
- Now data at /home/path on the remote host can be accessed via a local mount point /mnt/
- mountpoint .
- In order to unmount after completing the work, use:
- # umount /mnt/mountpoint
- www.it-ebooks.info
- The Old-boy Network
- 260
- See also
- f Running commands on remote host with SSH, explains the ssh command.
- Multi-casting window messages on
- a network
- The administrator of a network may often require to send messages to the nodes on the
- network. Displaying pop-up windows on the user's desktop would be helpful to alert the user
- with a piece of information. Using a GUI toolkit with shell scripting can achieve this task. This
- recipe discusses how to send a popup window with custom messages to remote hosts.
- Getting ready
- For implementing a GUI pop window, zenity can be used. Zenity is a scriptable GUI toolkit for
- creating windows consisting of textbox, input box, and so on. SSH can be used for connecting
- to the remote shell on a remote host. Zenity doesn't come installed by default with GNU/Linux
- distributions. Use a package manager to install zenity.
- How to do it...
- Zenity is one of the scriptable dialog creation toolkit. There are other toolkits, such as gdialog,
- kdialog, xdialog, and so on. Zenity seems to be one flexible toolkit that is adherent to the
- GNOME Desktop Environment.
- In order to create an info box with zenity, use:
- $ zenity --info --text "This is a message"
- # It will display a window with "This is a message" as text.
- Zenity can be used to create windows with input box, combo input, radio button, pushbutton,
- and more. They are not in the scope of this recipe. Check the man page of zenity for more.
- Now, we can use SSH to run these zenity statements on a remote machine. In order to run this
- statement on the remote host through SSH, run:
- $ ssh user@remotehost 'zenity --info --text "This is a message"'
- But this will return an error like:
- (zenity:3641): Gtk-WARNING **: cannot open display:
- This is because zenity depends on Xserver. Xsever is a daemon which is responsible for
- plotting graphical elements on the screen which consists of the GUI. A bare GNU/Linux system
- consists of only a text terminal or shell prompts.
- www.it-ebooks.info
- Chapter 7
- 261
- Xserver uses a special environment variable, DISPLAY , to track the Xserver instance that is
- running on the system.
- We can manually set DISPLAY=:0 to instruct Xserver about the Xserver instance.
- The previous SSH command can be rewritten as:
- $ ssh username@remotehost 'export DISPLAY=:0 ; zenity --info --text "This
- is a message"'
- This statement will display a pop up at remotehost if the user with username has been
- logged in any of the window managers.
- In order to multicast the popup window to multiple remote hosts, write a shell script as follows:
- #!/bin/bash
- #Filename: multi_cast_window.sh
- # Description: Multi-cast window popups
- IP_LIST="192.168.0.5 192.168.0.3 192.168.0.23"
- USER="username"
- COMMAND='export DISPLAY=:0 ;zenity --info --text "This is a message" '
- for host in $IP_LIST;
- do
- ssh $USER@$host "$COMMAND" &
- done
- How it works...
- In the above script, we have a list of IP addresses to which the window should be popped up.
- A loop is used to iterate through IP addresses and execute the SSH command.
- In the SSH statement, at the end we have post fixed & . & will send an SSH statement to the
- background. It is done to facilitate parallelization in the execution of several SSH statements.
- If & was not used, it will start the SSH session, execute the zenity dialog, and wait for the user
- to close that pop up window. Unless the user at the remote host closes the window, the next
- SSH statement in the loop will not be executed. In order to move away from this blocking of
- the loop from further execution by waiting for the SSH session to terminate, the & trick is used.
- See also
- f Running commands on remote host with SSH, explains the ssh command.
- www.it-ebooks.info
- The Old-boy Network
- 262
- Network traffic and port analysis
- Network ports are essential parameters of network-based applications. Applications open
- ports on the host and communicate to a remote host through opened ports at the remote
- host. Having awareness of opened and closed ports is essential for security context. Malwares
- and root kits may be running on the system with custom ports and custom services that allow
- attackers to capture unauthorized access to data and resources. By getting the list of opened
- ports and services running on the ports, we can analyze and defend the system from being
- controlled by root kits and the list helps to remove them efficiently. The list of opened ports
- is not only helpful for malware detection, but also for collecting information about opened
- ports on the system enables to debug network based applications. It helps to analyse whether
- certain port connections and port listening functionalities are working fine. This recipe
- discusses various utilities for port analysis.
- Getting ready
- Various commands are available for listening to ports and services running on each port (for
- example, lsof and netstat ). These commands are, by default, available on all GNU/Linux
- distributions.
- How to do it...
- In order to list all opened ports on the system along with the details on each service attached
- to it, use:
- $ lsof -i
- COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
- firefox-b 2261 slynux 78u IPv4 63729 0t0 TCP localhost:47797-
- >localhost:42486 (ESTABLISHED)
- firefox-b 2261 slynux 80u IPv4 68270 0t0 TCP slynux-laptop.
- local:41204->192.168.0.2:3128 (CLOSE_WAIT)
- firefox-b 2261 slynux 82u IPv4 68195 0t0 TCP slynux-laptop.
- local:41197->192.168.0.2:3128 (ESTABLISHED)
- ssh 3570 slynux 3u IPv6 30025 0t0 TCP localhost:39263-
- >localhost:ssh (ESTABLISHED)
- ssh 3836 slynux 3u IPv4 43431 0t0 TCP slynux-laptop.
- local:40414->boneym.mtveurope.org:422 (ESTABLISHED)
- GoogleTal 4022 slynux 12u IPv4 55370 0t0 TCP localhost:42486
- (LISTEN)
- GoogleTal 4022 slynux 13u IPv4 55379 0t0 TCP localhost:42486-
- >localhost:32955 (ESTABLISHED)
- Each entry in the output of lsof corresponds to each service that opens a port for
- communication. The last column of the output consists of lines similar to:
- www.it-ebooks.info
- Chapter 7
- 263
- slynux-laptop.local:34395->192.168.0.2:3128 (ESTABLISHED)
- In this output slynux-laptop.local:34395 corresponds to localhost part and
- 192.168.0.2:3128 corresponds to remote host.
- 34395 is the port opened from current machine, and 3128 is the port to which the service
- connects at remote host.
- In order to list out the opened ports from current machine, use:
- $ lsof -i | grep ":[0-9]\+->" -o | grep "[0-9]\+" -o | sort | uniq
- The :[0-9]\+-> regex for grep is used to extract the host port portion ( :34395-> ) from the
- lsof output. The next grep is used to extract the port number (which is numeric). Multiple
- connections may occur through the same port and hence multiple entries of the same port may
- occur. In order to display each port once, they are sorted and the unique ones are printed.
- There's more...
- Let's go through additional utilities that can be used for viewing the opened port and network
- traffic related information.
- Opened port and services using netstat
- netstat is another command for network service analysis. Explaining all the features of
- netstat is not in the scope of this recipe. We will now look at how to list services and port
- numbers.
- Use netstat -tnp to list opened ports and services as follows:
- $ netstat -tnp
- (Not all processes could be identified, non-owned process info
- will not be shown, you would have to be root to see it all.)
- Active Internet connections (w/o servers)
- Proto Recv-Q Send-Q Local Address Foreign Address State
- PID/Program name
- tcp 0 0 192.168.0.82:38163 192.168.0.2:3128
- ESTABLISHED 2261/firefox-bin
- tcp 0 0 192.168.0.82:38164 192.168.0.2:3128 TIME_
- WAIT -
- tcp 0 0 192.168.0.82:40414 193.107.206.24:422
- ESTABLISHED 3836/ssh
- tcp 0 0 127.0.0.1:42486 127.0.0.1:32955
- ESTABLISHED 4022/GoogleTalkPlug
- tcp 0 0 192.168.0.82:38152 192.168.0.2:3128
- ESTABLISHED 2261/firefox-bin
- tcp6 0 0 ::1:22 ::1:39263
- ESTABLISHED -
- tcp6 0 0 ::1:39263 ::1:22
- ESTABLISHED 3570/ssh
- www.it-ebooks.info
- www.it-ebooks.info
- 8
- Put on the Monitor's
- Cap
- In this chapter, we will cover:
- f Disk usage hacks
- f Calculating the execution time for a command
- f Information about logged users, boot logs, failure boots
- f Printing the 10 most frequently-used commands
- f Listing the top 10 CPU consuming process in 1 hour
- f Monitoring command outputs with watch
- f Logging access to files and directories
- f Logfile management with logrotate
- f Logging with syslog
- f Monitoring user logins to find intruders
- f Remote disk usage health monitoring
- f Finding out active user hours on a system
- www.it-ebooks.info
- Put on the Monitor’s Cap
- 266
- Introduction
- An operating system consists of a collection of system software, designed for different
- purposes, serving different task sets. Each of these programs requires to be monitored by the
- operating system or the system administrator in order to know whether it is working properly
- or not. We will also use a technique called logging by which important information is written to
- a file while the application is running. By reading this file, we can understand the timeline of
- the operations that are taking place with a particular software or a daemon. If an application
- or a service crashes, this information helps to debug the issue and enables us to fix any
- issues. Logging and monitoring also helps to gather information from a pool of data. Logging
- and monitoring are important tasks for ensuring security in the operating system and for
- debugging purposes.
- This chapter deals with different commands that can be used to monitor different activities. It
- also goes through logging techniques and their usages.
- Disk usage hacks
- Disk space is a limited resource. We frequently perform disk usage calculation on hard
- disks or any storage media to find out the free space available on the disk. When free space
- becomes scarce, we will need to find out large-sized files that are to be deleted or moved in
- order to create free space. Disk usage manipulations are commonly used in shell scripting
- contexts. This recipe will illustrate various commands used for disk manipulations and
- problems where disk usages can be calculated with a variety of options.
- Getting ready
- df and du are the two significant commands that are used for calculating disk usage in Linux.
- The command df stands for disk free and du stands for disk usage. Let's see how we can use
- them to perform various tasks that involve disk usage calculation.
- How to do it...
- To find the disk space used by a file (or files), use:
- $ du FILENAME1 FILENAME2 ..
- For example:
- $ du file.txt
- 4
- www.it-ebooks.info
- Chapter 8
- 267
- The result is, by default, shown as size in bytes.
- In order to obtain the disk usage for all files inside a directory along with the individual disk
- usage for each file showed in each line, use:
- $ du -a DIRECTORY
- -a outputs results for all files in the specified directory or directories recursively.
- Running du DIRECTORY will output a similar result, but it will show only the
- size consumed by subdirectories. However, they do not show the disk usage
- for each of the files. For printing the disk usage by files, -a is mandatory.
- For example:
- $ du -a test
- 4 test/output.txt
- 4 test/process_log.sh
- 4 test/pcpu.sh
- 16 test
- An example of using du DIRECTORY is as follows:
- $ du test
- 16 test
- There's more...
- Let's go through additional usage practices for the du command.
Add Comment
Please, Sign In to add comment