Untitled

It has the format  's/substitution_pattern/replacement_string/g .
It replaces every occurrence of  substitution_pattern with the replacement string.
Here the substitution pattern is the regex for a sentence. Every sentence is delimited by "."
and the first character is a space. Therefore, we need to match the text that is in the format
"space" some text MATCH_STRING some text "dot". A sentence may contain any characters
except a "dot", which is the delimiter. Hence we have used [^.]. [^.]* matches a combination of
any characters except dot. In between the text match string "mobile phones" is placed. Every
match sentence is replaced by  //  (nothing).
See also
f Basic sed primer, explains the sed command
f Basic regular expression primer, explains how to use regular expressions
Implementing head, tail, and tac with awk
Mastering text-processing operations comes with practice. This recipe will help us practice
incorporating some of the commands that we have just learned with some that we already
know.
Getting ready
The commands  head ,  tail ,  uniq , and  tac  operate line by line. Whenever we need line by
line processing, we can always use  awk . Let's emulate these commands with  awk .
www.it-ebooks.info
Texting and Driving
176
How to do it...
Let's see how different commands can be emulated with different basic text processing
commands, such as head, tail, and tac.
The  head  command reads the first ten lines of a file and prints them out:
$ awk 'NR <=10' filename
The  tail  command prints the last ten lines of a file:
$ awk '{ buffer[NR % 10] = $0; } END { for(i=1;i<11;i++) { print
buffer[i%10] } }' filename
The  tac command prints the lines of input file in reverse order:
$ awk '{ buffer[NR] = $0; } END { for(i=NR; i>0; i--) { print buffer[i] }
}' filename
How it works...
In the implementation of  head using  awk , we print the lines in the input stream having a line
number less than or equal to  10 . The line number is available using the special variable  NR .
In the implementation of the  tail  command a hashing technique is used. The buffer array
index is determined by a hashing function  NR % 10 , where  NR is the variable that contains the
Linux number of current execution.  $0 is the line in the text variable. Hence  % maps all the lines
having the same remainder in the hash function to a particular index of an array. In the  END{}
block, it can iterate through ten index values of an array and print the lines stored in a buffer.
In the  tac  command emulation, it simply stores all the lines in an array. When it appears in
the  END{}  block,  NR will be holding the line number of the last line. Then it is decremented in
a  for  loop until it reaches  1 and it prints the lines stored in each iteration statement.
See also
f Basic awk primer, explains the awk command
f head and tail - printing the last or first 10 lines of Chapter 3, explains the commands
head and tail
f Sorting, unique and duplicates of Chapter 2, explains the uniq command
f Printing lines in reverse order, explains the tac command
www.it-ebooks.info
Chapter 4
177
Text slicing and parameter operations
This recipe walks through some of the simple text replacement techniques and parameter
expansion short hands available in Bash. A few simple techniques can often help us avoid
having to write multiple lines of code.
How to do it...
Let's get into the tasks.
Replacing some text from a variable can be done as follows:
$ var="This is a line of text"
$ echo ${var/line/REPLACED}
This is a REPLACED of text"
line is replaced with  REPLACED .
We can produce a sub-string by specifying the start position and string length, by using the
following syntax:
${variable_name:start_position:length}
To print from the fifth character onward use the following command:
$ string=abcdefghijklmnopqrstuvwxyz
$ echo ${string:4}
efghijklmnopqrstuvwxyz
To print eight characters starting from the fifth character, use:
$ echo ${string:4:8}
efghijkl
The index is specified by counting the start letter as  0 . We can also specify counting from last
letter as  -1 . It is but used inside a parenthesis.  (-1) is the index for the last letter.
echo ${string:(-1)}
z
$ echo ${string:(-2):2}
yz
See also
f Iterating through lines, words, and characters in a file, explains slicing of a character
from a word
www.it-ebooks.info
www.it-ebooks.info
5
Tangled Web?
Not At All!
In this chapter, we will cover:
f Downloading from a web page
f Downloading a web page as formatted plain text
f A primer on cURL
f Accessing unread Gmail mails from the command line
f Parsing data from a website
f Creating an image crawler and downloader
f Creating a web photo album generator
f Building a Twitter command-line client
f Define utility with Web backend
f Finding broken links in a website
f Tracking changes to a website
f Posting to a web page and reading response
www.it-ebooks.info
Tangled Web? Not At All!
180
Introduction
The Web is becoming the face of technology. It is the central access point for data processing.
Though shell scripting cannot do everything that languages like PHP can do on the Web, there
are still many tasks to which shell scripts are ideally suited. In this chapter we will explore
some recipes that can be used to parse website content, download and obtain data, send
data to forms, and automate website usage tasks and similar activities. We can automate
many activities that we perform interactively through a browser with a few lines of scripting.
Access to the functionalities provided by the HTTP protocol with command-line utilities
enables us to write scripts that are suitable to solve most of the web-automation utilities.
Have fun while going through the recipes of this chapter.
Downloading from a web page
Downloading a file or a web page from a given URL is simple. A few command-line download
utilities are available to perform this task.
Getting ready
wget is a file download command-line utility. It is very flexible and can be configured with
many options.
How to do it...
A web page or a remote file can be downloaded using  wget as follows:
$ wget URL
For example:
$ wget http://slynux.org
--2010-08-01 07:51:20-- http://slynux.org/
Resolving slynux.org... 174.37.207.60
Connecting to slynux.org|174.37.207.60|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15280 (15K) [text/html]
Saving to: "index.html"
100%[======================================>] 15,280 75.3K/s in
0.2s
2010-08-01 07:51:21 (75.3 KB/s) - "index.html" saved [15280/15280]
www.it-ebooks.info
Chapter 5
181
It is also possible to specify multiple download URLs as follows:
$ wget URL1 URL2 URL3 ..
A file can be downloaded using  wget using the URL as:
$ wget ftp://example_domain.com/somefile.img
Usually, files are downloaded with the same filename as in the URL and the download log
information or progress is written to  stdout .
You can specify the output file name with the  -O option. If the file with the specified filename
already exists, it will be truncated first and the downloaded file will be written to the specified
file.
You can also specify a different logfile path rather than printing logs to  stdout by using
the  -o option as follows:
$ wget ftp://example_domain.com/somefile.img -O dloaded_file.img -o log
By using the above command, nothing will be printed on screen. The log or progress will be
written to  log and the output file will be  dloaded_file.img .
There is a chance that downloads might break due to unstable Internet connections. Then we
can use the number of tries as an argument so that once interrupted, the utility will retry the
download that many times before giving up.
In order to specify the number of tries, use the  -t flag as follows:
$ wget -t 5 URL
There's more...
The  wget utility has several additional options that can be used under different problem
domains. Let's go through a few of them.
Restricted with speed downloads
When we have a limited Internet downlink bandwidth and many applications sharing the
internet connection, if a large file is given for download, it will suck all the bandwidth and
may cause other process to starve for bandwidth. The  wget command comes with a built-in
option to specify the maximum bandwidth limit the download job can possess. Hence all the
applications can simultaneously run smoothly.
We can restrict the speed of  wget by using the  --limit-rate argument as follows:
$ wget --limit-rate 20k http://example.com/file.iso
In this command  k (kilobyte) and  m (megabyte) specify the speed limit.
www.it-ebooks.info
Tangled Web? Not At All!
182
We can also specify the maximum quota for the download. It will stop when the quota is
exceeded. It is useful when downloading multiple files limited by the total download size. This
is useful to prevent the download from accidently using too much disk space.
Use  --quota or  –Q as follows:
$ wget -Q 100m http://example.com/file1 http://example.com/file2
Resume downloading and continue
If a download using  wget gets interrupted before it is completed, we can resume the
download where we left off by using the  -c option as follows:
$ wget -c URL
Using cURL for download
cURL is another advanced command-line utility. It is much more powerful than  wget .
cURL can be used to download as follows:
$ curl http://slynux.org > index.html
Unlike  wget ,  curl  writes the downloaded data into standard output ( stdout ) rather than to a
file. Therefore, we have to redirect the data from  stdout to the file using a redirection operator.
Copying a complete website (mirroring)
wget has an option to download the complete website by recursively collecting all the URL
links in the web pages and downloading all of them like a crawler. Hence we can completely
download all the pages of a website.
In order to download the pages, use the  --mirror option as follows:
$ wget --mirror exampledomain.com
Or use:
$ wget -r -N -l DEPTH URL
-l specifies the  DEPTH of web pages as levels. That means it will traverse only that much
number of levels. It is used along with  –r (recursive). The  -N argument is used to enable time
stamping for the file.  URL is the base URL for a website for which the download needs to be
initiated.
Accessing pages with HTTP or FTP authentication
Some web pages require authentication for HTTP or FTP URLs. This can be provided by using
the  --user and  --password arguments:
$ wget –-user username –-password pass URL
www.it-ebooks.info
Chapter 5
183
It is also possible to ask for a password without specifying the password inline. In order to do
that use  --ask-password instead of the  --password argument.
Downloading a web page as formatted
plain text
Web pages are HTML pages containing a collection of HTML tags along with other elements,
such as JavaScript, CSS, and so on. But the HTML tags define the base of a web page. We
may need to parse the data in a web page while looking for specific content, and this is
something Bash scripting can help us with. When we download a web page, we receive an
HTML file. In order to view formatted data, it should be viewed in a web browser. However, in
most of the circumstances, parsing a formatted text document will be easier than parsing
HTML data. Therefore, if we can get a text file with formatted text similar to the web page seen
on the web browser, it is more useful and it saves a lot of effort required to strip off HTML
tags. Lynx is an interesting command-line web browser. We can actually get the web page as
plain text formatted output from Lynx. Let's see how to do it.
How to do it...
Let's download the webpage view, in ASCII character representation, in a text file using the
–dump flag with the  lynx command:
$ lynx -dump URL > webpage_as_text.txt
This command will also list all the hyper-links ( <a href="link"> ) separately under a
heading References as the footer of the text output. This would help us avoid parsing of links
separately using regular expressions.
For example:
$ lynx -dump http://google.com > plain_text_page.txt
You can see the plain text version of text by using the  cat command as follows:
$ cat plain_text_page.txt
A primer on cURL
cURL is a powerful utility that supports many protocols including HTTP, HTTPS, FTP, and much
more. It supports many features including POST, cookie, authentication, downloading partial
files from a specified offset, referers, user agent strings, extra headers, limit speed, maximum
file size, progress bars, and so on. cURL is useful for when we want to play around with
automating a web page usage sequence and to retrieve data. This recipe is a list of the most
important features of cURL.
www.it-ebooks.info
Tangled Web? Not At All!
184
Getting ready
cURL doesn't come with any of the main Linux distros by default, so you may have to install it
using the package manager. By default, most distributions ship with  wget .
cURL usually dumps downloaded files to  stdout and progress information to  stderr . To
avoid progress information from being shown, we always use the --silent option.
How to do it…
The  curl command can be used to perform different activities such as downloading, sending
different HTTP requests, specifying HTTP headers, and so on. Let's see how to perform
different tasks with cURL.
$ curl URL --silent
The above command dumps the downloaded file into the terminal (the downloaded data is
written to  stdout ).
The  --silent option is used to prevent the  curl command from displaying progress
information. If progress information is required, remove  --silent .
$ curl URL –-silent -O
The  -O option is used to write the downloaded data into a file with the filename parsed from
the URL rather than writing into the standard output.
For example:
$ curl http://slynux.org/index.html --silent -O
index.html will be created.
It writes a web page or file to the filename as in the URL instead of writing to  stdout . If
filenames are not there in the URL, it will produce an error. Hence, make sure that the URL
is a URL to a remote file.  curl http://slynux.org -O --silent will display an error
since the filename cannot be parsed from the URL.
$ curl URL –-silent -o new_filename
The  -o option is used to download a file and write to a file with a specified file name.
In order to show the  # progress bar while downloading, use  –-progress instead of
–-silent .
$ curl http://slynux.org -o index.html --progress
################################## 100.0%
www.it-ebooks.info
Chapter 5
185
There's more...
In the previous sections we have learned how to download files and dump HTML pages to the
terminal. There several advanced options that come along with cURL. Let's explore more
on cURL.
Continue/Resume downloading
cURL has advanced resume download features to continue at a given offset unlike  wget . It
helps to download portions of files by specifying an offset.
$ curl URL/file -C offset
The offset is an integer value in bytes.
cURL doesn't require us to know the exact byte offset if we want to resume downloading a file.
If you want cURL to figure out the correct resume point, use the  -C - option, like this:
$ curl -C - URL
cURL will automatically figure out where to restart the download of the specified file.
Set referer string with cURL
Referer is a string in the HTTP header used to identify the page from which the user reaches
the current web page. When a user clicks on a link from web page A and it reaches web page
B, the referer header string in the page B will contain a URL of page A.
Some dynamic pages check the referer string before returning HTML data. For example, a web
page shows a Google logo attached page when a user navigates to a website by searching on
Google, and shows a different page when they navigate to the web page by manually typing
the URL.
The web page can write a condition to return a Google page if the referer is  www.google.com
or else return a different page.
You can use  --referer with the  curl command to specify the referer string as follows:
$ curl –-referer Referer_URL target_URL
For example:
$ curl –-referer http://google.com http://slynux.org
Cookies with cURL
Using  curl we can specify as well as store cookies encountered during HTTP operations.
In order to specify cookies, use the  --cookie "COOKIES" option.
www.it-ebooks.info
Tangled Web? Not At All!
186
Cookies should be provided as  name=value . Multiple cookies should be delimited by a
semicolon ";". For example:
$ curl http://example.com –-cookie "user=slynux;pass=hack"
In order to specify a file to which cookies encountered are to be stored, use the  --cookie-
jar option. For example:
$ curl URL –-cookie-jar cookie_file
Setting a user agent string with cURL
Some web pages that check the user-agent won't work if there is no user-agent specified. You
may have noticed that certain websites work well only in Internet Explorer (IE). If a different
browser is used, the website will show a message that it will work only on IE. This is because
the website checks for a user agent. You can set the user agent as IE with  curl and see that
it returns a different web page in this case.
Using cURL it can be set using  --user-agent or  –A as follows:
$ curl URL –-user-agent "Mozilla/5.0"
Additional headers can be passed with cURL. Use  –H "Header" to pass multiple additional
headers. For example:
$ curl -H "Host: www.slynux.org" -H "Accept-language: en" URL
Specifying bandwidth limit on cURL
When the available bandwidth is limited and multiple users are sharing the Internet, in order
to perform the sharing of bandwidth smoothly, we can limit the download rate to a specified
limit from  curl by using the  --limit-rate option as follows:
$ curl URL --limit-rate 20k
In this command  k (kilobyte) and  m (megabyte) specify the download rate limit.
Specifying the maximum download size
The maximum download file size for cURL can be specified using the  --max-filesize
option as follows:
$ curl URL --max-filesize bytes
It will return a non-zero exit code if the file size exceeds. It will return zero if it succeeds.
Authenticating with cURL
HTTP authentication or FTP authentication can be done using cURL with the  -u argument.
www.it-ebooks.info
Chapter 5
187
The username and password can be specified using  -u username:password . It is possible
to not provide a password such that it will prompt for password while executing.
If you prefer to be prompted for the password, you can do that by using only  -u username .
For example:
$ curl -u user:pass http://test_auth.com
In order to be prompted for the password use:
$ curl -u user http://test_auth.com
Printing response headers excluding data
It is useful to print only response headers to apply many checks or statistics. For example, to
check whether a page is reachable or not, we don't need to download the entire page contents.
Just reading the HTTP response header can be used to identify if a page is available or not.
An example usage case for checking the HTTP header is to check the file size before
downloading. We can check the  Content-Length parameter in the HTTP header to find out
the length of a file before downloading. Also, several useful parameters can be retrieved from
the header. The  Last-Modified parameter enables to know the last modification time for
the remote file.
Use the  –I  or  –head  option with  curl to dump only HTTP headers without downloading the
remote file. For example:
$ curl -I http://slynux.org
HTTP/1.1 200 OK
Date: Sun, 01 Aug 2010 05:08:09 GMT
Server: Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_log_bytes/1.2
mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_
ssl/2.8.31 OpenSSL/0.9.7a
Last-Modified: Thu, 19 Jul 2007 09:00:58 GMT
ETag: "17787f3-3bb0-469f284a"
Accept-Ranges: bytes
Content-Length: 15280
Connection: close
Content-Type: text/html
See also
f Posting to a web page and reading response
www.it-ebooks.info
Tangled Web? Not At All!
188
Accessing Gmail from the command line
Gmail is a widely-used free e-mail service from Google : http://mail.google.com/ .
Gmail allows you to read your mail via authenticated RSS feeds. We can parse the RSS feeds
with the sender's name and an e-mail with subject. It will help to have a look at unread mails
in the inbox without opening the web browser.
How to do it...
Let's go through the shell script to parse the RSS feeds for Gmail to display the unread mails:
#!/bin/bash
Filename: fetch_gmail.sh
#Description: Fetch gmail tool
username="PUT_USERNAME_HERE"
password="PUT_PASSWORD_HERE"
SHOW_COUNT=5 # No of recent unread mails to be shown
echo
curl -u $username:$password --silent "https://mail.google.com/mail/
feed/atom" | \
tr -d '\n' | sed 's:</entry>:\n:g' |\
sed 's/.*<title>\(.*\)<\/title.*<author><name>\([^<]*\)<\/
name><email>
\([^<]*\).*/Author: \2 [\3] \nSubject: \1\n/' | \
head -n $(( $SHOW_COUNT * 3 ))
The output will be as follows:
$ ./fetch_gmail.sh
Author: SLYNUX [ slynux@slynux.com ]
Subject: Book release - 2
Author: SLYNUX [ slynux@slynux.com ]
Subject: Book release - 1
.
… 5 entries
How it works...
The script uses cURL to download the RSS feed by using user authentication. User authentication
is provided by the  -u username:password argument. You can use  -u user without providing
the password. Then while executing cURL it will interactively ask for the password.
www.it-ebooks.info
Chapter 5
189
Here we can split the piped commands into different blocks to illustrate how they work.
tr -d '\n' removes the newline character so that we restructure each mail entry with  \n
as the delimiter.  sed 's:</entry>:\n:g' replaces every  </entry> with a newline so that
each mail entry is delimited by a newline and hence mails can be parsed one by one. Have a
look at the source of  https://mail.google.com/mail/feed/atom for XML tags used in
the RSS feeds.  <entry> TAGS </entry> corresponds to a single mail entry.
The next block of script is as follows:
sed 's/.*<title>\(.*\)<\/title.*<author><name>\([^<]*\)<\/
name><email>
\([^<]*\).*/Author: \2 [\3] \nSubject: \1\n/'
This script matches the substring title using  <title>\(.*\)<\/title , the sender name
using  <author><name>\([^<]*\)<\/name> , and e-mail using  <email>\([^<]*\) . Then
back referencing is used as follows:
f Author: \2 [\3] \nSubject: \1\n is used to replace an entry for a mail with
the matched items in an easy-to-read format.  \1 corresponds to the first substring
match,  \2 for the second substring match, and so on.
f The  SHOW_COUNT=5 variable is used to take the number of unread mail entries to be
printed on terminal.
f head is used to display only  SHOW_COUNT*3 lines from the first line.  SHOW_COUNT is
used three times in order to show three lines of the output.
See also
f A primer on cURL, explains the curl command
f Basic sed primer of Chapter 4, explains the sed command
Parsing data from a website
It is often useful to parse data from web pages by eliminating unnecessary details.  sed and  awk
are the main tools that we will use for this task. You might have come across a list of access
rankings in a grep recipe in the previous chapter Texting and driving; it was generated by parsing
the website page  http://www.johntorres.net/BoxOfficefemaleList.html .
Let's see how to parse the same data using text-processing tools.
www.it-ebooks.info
Tangled Web? Not At All!
190
How to do it...
Let's go through the command sequence used to parse details of actresses from the website:
$ lynx -dump http://www.johntorres.net/BoxOfficefemaleList.html | \ grep
-o "Rank-.*" | \
sed 's/Rank-//; s/\[[0-9]\+\]//' | \
sort -nk 1 |\
awk '
{
for(i=3;i<=NF;i++){ $2=$2" "$i }
printf "%-4s %s\n", $1,$2 ;
}' > actresslist.txt
The output will be as follows:
# Only 3 entries shown. All others omitted due to space limits
1 Keira Knightley
2 Natalie Portman
3 Monica Bellucci
How it works...
Lynx is a command-line web browser; it can dump the text version of the website as we
would see in a web browser rather than showing us the raw code. Hence it avoids the job of
removing the HTML tags. We parse the lines starting with Rank, using  sed as follows:
sed 's/Rank-//; s/\[[0-9]\+\]//'
These lines could be then sorted according to the ranks.  awk is used here to keep the spacing
between rank and the name uniform by specifying the width.  %-4s specifies a four-character
width. All the fields except the first field are concatenated to form a single string as  $2 .
See also
f Basic sed primer of Chapter 4, explains the sed command
f Basic awk primer of Chapter 4, explains the awk command
f Downloading a web page as formatted plain text, explains the lynx command
www.it-ebooks.info
Chapter 5
191
Image crawler and downloader
Image crawlers are very useful when we need to download all the images that appear in a web
page. Instead of going through the HTML sources and picking all the images, we can use a
script to parse the image files and download them automatically. Let's see how to do it.
How to do it...
Let's write a Bash script to crawl and download the images from a web page as follows:
#!/bin/bash
#Description: Images downloader
#Filename: img_downloader.sh
if [ $# -ne 3 ];
then
echo "Usage: $0 URL -d DIRECTORY"
exit -1
fi
for i in {1..4}
do
case $1 in
-d) shift; directory=$1; shift ;;
*) url=${url:-$1}; shift;;
esac
done
mkdir -p $directory;
baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
curl –s $url | egrep -o "<img src=[^>]*>" |
sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
sed -i "s|^/|$baseurl/|" /tmp/$$.list
cd $directory;
while read filename;
do
curl –s -O "$filename" --silent
done < /tmp/$$.list
An example usage is as follows:
$ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images
www.it-ebooks.info
Tangled Web? Not At All!
192
How it works...
The above image downloader script parses an HTML page, strips out all tags except  <img> ,
then parses  src="URL" from the  <img> tag and downloads them to the specified directory.
This script accepts a web page URL and the destination directory path as command-line
arguments. The first part of the script is a tricky way to parse command-line arguments.
The  [ $# -ne 3 ] statement checks whether the total number of arguments to the script
is three, else it exits and returns a usage example.
If it is 3 arguments, then parse the URL and the destination directory. In order to do that a
tricky hack is used:
for i in {1..4}
do
case $1 in
-d) shift; directory=$1; shift ;;
*) url=${url:-$1}; shift;;
esac
done
A  for loop is iterated four times (there is no significance to the number four, it is just to iterate
a couple of times to run the case statement).
The  case  statement will evaluate the first argument ( $1 ), and matches  -d or any other
string arguments that are checked. We can place the  -d argument anywhere in the format as
follows:
$ ./img_downloader.sh -d DIR URL
Or:
$ ./img_downloader.sh URL -d DIR
shift is used to shift arguments such that when  shift is called  $1 will be assigned with
$2 , when again called  $1=$3 and so on as it shifts  $1 to the next arguments. Hence we can
evaluate all arguments through  $1 itself.
When  -d is matched ( -d)  ), it is obvious that the next argument is the value for the
destination directory.  *) corresponds to default match. It will match anything other than
-d . Hence while iteration  $1="" or  $1=URL in the default match, we need to take  $1=URL
avoiding  "" to overwrite. Hence we use the  url=${url:-$1} trick. It will return a URL value
if already not  "" else it will assign  $1 .
egrep -o "<img src=[^>]*>" will print only the matching strings, which are the  <img>
tags including their attributes.  [^>]* used to match all characters except the closing  > , that
is,  <img src="image.jpg" …. > .
www.it-ebooks.info
Chapter 5
193
sed 's/<img src=\"\([^"]*\).*/\1/g' parses  src="url" so that all image URLs
can be parsed from the  <img> tags already parsed.
There are two types of image source paths: relative and absolute. Absolute paths contain full
URLs that start with  http:// or  https:// . Relative URLs starts with  / or  image_name itself.
An example of an absolute URL is:  http://example.com/image.jpg
An example of a relative URL is:  /image.jpg
For relative URLs the starting  / should be replaced with the base URL to transform it to
http://example.com/image.jpg .
For that transformation, we initially find out  baseurl sed by parsing.
Then replace every occurrence of the starting  / with  baseurl sed as  sed -i
"s|^/|$baseurl/|" /tmp/$$.list .
Then a  while loop is used to iterate the list line by line and download the URL using  curl .
The  --silent argument is used with  curl to avoid other progress messages from being
printed on the screen.
See also
f A primer on cURL, explains the curl command
f Basic sed primer of Chapter 4, explains the sed command
f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
command
Web photo album generator
Web developers commonly design photo album pages for websites that consist of a number
of image thumbnails on the page. When thumbnails are clicked, a large version of the
picture will be displayed. But when many images are required, copying the  <img> tag every
time, resizing the image to create a thumbnail, placing them in the thumbs directory, testing
the links, and so on are real hurdles. It takes a lot of time and repeats the same task. It
can be automated easily by writing a simple Bash script. By writing a script, we can create
thumbnails, place them in exact directories, and generate the code fragment for  <img> tags
automatically in few seconds. This recipe will teach you how to do it.
Getting ready
We can perform this task with a  for loop that iterates every image in the current directory.
The usual Bash utilities such as  cat and  convert (image magick) are used. These will
generate an HTML album, using all the images, to  index.html . In order to use  convert ,
make sure you have Imagemagick installed.
www.it-ebooks.info
Tangled Web? Not At All!
194
How to do it...
Let's write a Bash script to generate a HTML album page:
#!/bin/bash
#Filename: generate_album.sh
#Description: Create a photo album using images in current directory
echo "Creating album.."
mkdir -p thumbs
cat <<EOF > index.html
<html>
<head>
<style>
body
{
width:470px;
margin:auto;
border: 1px dashed grey;
padding:10px;
}
img
{
margin:5px;
border: 1px solid black;
}
</style>
</head>
<body>
<center><h1> #Album title </h1></center>
<p>
EOF
for img in *.jpg;
do
convert "$img" -resize "100x" "thumbs/$img"
echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" />
</a>" >> index.html
done
cat <<EOF >> index.html
</p>
</body>
</html>
EOF
echo Album generated to index.html
www.it-ebooks.info
Chapter 5
195
Run the script as follows:
$ ./generate_album.sh
Creating album..
Album generated to index.html
How it works...
The initial part of the script is to write the header part of the HTML page.
The following script redirects all the contents up to EOF (excluding) to the  index.html :
cat <<EOF > index.html
contents...
EOF
The header includes the HTML and stylesheets.
for img in *.jpg; will iterate through names of each file and will perform actions.
convert "$img" -resize "100x" "thumbs/$img" will create images of 100px width
as thumbnails.
The following statement will generate the required  <img> tag and appends it to the  index.html :
echo "<a href=\"$img\" ><img src=\"thumbs/$img\" title=\"$img\" /></
a>" >> index.html
Finally, the footer HTML tags are appended with  cat again.
See also
f Playing with file descriptors and redirection of Chapter 1, explains EOF and stdin
redirection.
Twitter command-line client
Twitter is the hottest micro blogging platform as well as the latest buzz of online social media.
Tweeting and reading tweets is fun. What if we can do both from command line? It is pretty
simple to write a command-line Twitter client. Twitter has RSS feeds and hence we can make
use of them. Let's see how to do it.
Getting ready
We can use cURL to authenticate and send twitter updates as well as download the RSS feed
pages to parse the tweets. Just four lines of code can do it. Let's do it.
www.it-ebooks.info
Tangled Web? Not At All!
196
How to do it...
Let's write a Bash script using the  curl command to manipulate twitter APIs:
#!/bin/bash
#Filename: tweets.sh
#Description: Basic twitter client
USERNAME="PUT_USERNAME_HERE"
PASSWORD="PUT_PASSWORD_HERE"
COUNT="PUT_NO_OF_TWEETS"
if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];
then
echo -e "Usage: $0 send status_message\n OR\n $0 read\n"
exit -1;
fi
if [[ "$1" = "read" ]];
then
curl --silent -u $USERNAME:$PASSWORD http://twitter.com/statuses/
friends_timeline.rss | \
grep title | \
tail -n +2 | \
head -n $COUNT | \
sed 's:.*<title>\([^<]*\).*:\n\1:'
elif [[ "$1" = "tweet" ]];
then
status=$( echo $@ | tr -d '"' | sed 's/.*tweet //')
curl --silent -u $USERNAME:$PASSWORD -d status="$status" http://
twitter.com/statuses/update.xml > /dev/null
echo 'Tweeted :)'
fi
Run the script as follows:
$ ./tweets.sh tweet Thinking of writing a X version of wall command
"#bash"
Tweeted :)
$ ./tweets.sh read
bot: A tweet line
t3rm1n4l: Thinking of writing a X version of wall command #bash
www.it-ebooks.info
Chapter 5
197
How it works...
Let's see the working of above script by splitting it into two parts. The first part is
about reading tweets. To read tweets the script downloads the RSS information from
http://twitter.com/statuses/friends_timeline.rss and parses the lines
containing the  <title> tag. Then it strips off the  <title> and  </title> tags using  sed
to form the required tweet text. Then a  COUNT variable is used to remove all other text except
the number of recent tweets by using the  head command.  tail –n +2 is used to remove an
unnecessary header text "Twitter: Timeline of friends".
In the sending tweet part, the  -d status argument of  curl is used to post data to Twitter
using their API:  http://twitter.com/statuses/update.xml .
$1 of the script will be the tweet in the case of sending a tweet. Then to obtain the status we
take  $@ (list of all arguments of the script) and remove the word "tweet" from it.
See also
f A primer on cURL, explains the curl command
f head and tail - printing the last or first 10 lines of Chapter 3, explains the commands
head and tail
define utility with Web backend
Google provides Web definitions for any word by using the search query  define:WORD . We
need a GUI web browser to fetch the definitions. However, we can automate it and parse the
required definitions by using a script. Let's see how to do it.
Getting ready
We can use  lynx ,  sed ,  awk , and  grep to write the define utility.
How to do it...
Let's go through the code for the define utility script to fetch definitions from Google search:
#!/bin/bash
#Filename: define.sh
#Description: A Google define: frontend
limit=0
if [ ! $# -ge 1 ];
then
echo -e "Usage: $0 WORD [-n No_of_definitions]\n"
exit -1;
www.it-ebooks.info
Tangled Web? Not At All!
198
fi
if [ "$2" = "-n" ];
then
limit=$3;
let limit++
fi
word=$1
lynx -dump http://www.google.co.in/search?q=define:$word | \
awk '/Defini/,/Find defini/' | head -n -1 | sed 's:*:\n*:; s:^[ ]*::'
| \
grep -v "[[0-9]]" | \
awk '{
if ( substr($0,1,1) == "*" )
{ sub("*",++count".") } ;
print
} ' > /tmp/$$.txt
echo
if [ $limit -ge 1 ];
then
cat /tmp/$$.txt | sed -n "/^1\./, /${limit}/p" | head -n -1
else
cat /tmp/$$.txt;
fi
Run the script as follows:
$ ./define.sh hack -n 2
1. chop: cut with a hacking tool
2. one who works hard at boring tasks
How it works...
We will look into the core part of the definition parser. Lynx is used to obtain the plain text
version of the web page.  http://www.google.co.in/search?q=define:$word is
the URL for the web definition web page. Then we reduce the text between "Definitions on
web" and "Find definitions". All the definitions are occurring in between these lines of text
( awk '/Defini/,/Find defini/' ).
www.it-ebooks.info
Chapter 5
199
's:*:\n*:' is used to replace * with * and newline in order to insert a newline in between
each definition, and  s:^[ ]*:: is used to remove extra spaces in the start of lines. Hyperlinks
are marked as [number] in lynx output. Those lines are removed by  grep -v , the invert match
lines option. Then  awk is used to replace the * occurring at start of the line with a number so
that each definition can assign a serial number. If we have read a  -n count in the script, it has to
output only a few definitions as per count. So  awk is used to print the definitions with number 1
to count (this makes it easier since we replaced * with the serial number).
See also
f Basic sed primer of Chapter 4, explains the sed command
f Basic awk primer of Chapter 4, explains the awk command
f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
command
f Downloading a web page as formatted plain text, explains the lynx command
Finding broken links in a website
I have seen people manually checking each and every page on a site to search for broken links.
It is possible only for websites having very few pages. When the number of pages become large,
it will become impossible. It becomes really easy if we can automate finding broken links. We
can find the broken links by using HTTP manipulation tools. Let's see how to do it.
Getting ready
In order to identify the links and find the broken ones from the links, we can use  lynx and
curl . It has an option  -traversal , which will recursively visit pages in the website and build
the list of all hyperlinks in the website. We can use cURL to verify whether each of the links
are broken or not.
How to do it...
Let's write a Bash script with the help of the  curl  command to find out the broken links on a
web page:
#!/bin/bash
#Filename: find_broken.sh
#Description: Find broken links in a website
if [ $# -eq 2 ];
then
echo -e "$Usage $0 URL\n"
exit -1;
fi
www.it-ebooks.info
Tangled Web? Not At All!
200
echo Broken links:
mkdir /tmp/$$.lynx
cd /tmp/$$.lynx
lynx -traversal $1 > /dev/null
count=0;
sort -u reject.dat > links.txt
while read link;
do
output=`curl -I $link -s | grep "HTTP/.*OK"`;
if [[ -z $output ]];
then
echo $link;
let count++
fi
done < links.txt
[ $count -eq 0 ] && echo No broken links found.
How it works...
lynx -traversal URL will produce a number of files in the working directory. It includes
a file  reject.dat which will contain all the links in the website.  sort -u is used to build a
list by avoiding duplicates. Then we iterate through each link and check the header response
by using  curl -I . If the header contains first line  HTTP/1.0 200 OK as the response, it
means that the target is not broken. All other responses correspond to broken links and are
printed out to  stdout .
See also
f Downloading a web page as formatted plain text, explains the lynx command
f A primer on cURL, explains the curl command
Tracking changes to a website
Tracking changes to a website is helpful to web developers and users. Checking a website
manually in intervals is really hard and impractical. Hence we can write a change tracker
running at repeated intervals. When a change occurs, it can play a sound or send a
notification. Let's see how to write a basic tracker for the website changes.
www.it-ebooks.info
Chapter 5
201
Getting ready
Tracking changes in terms of Bash scripting means fetching websites at different times and
taking the difference using the  diff command. We can use  curl and  diff to do this.
How to do it...
Let's write a Bash script by combining different commands to track changes in a web page:
#!/bin/bash
#Filename: change_track.sh
#Desc: Script to track changes to webpage
if [ $# -eq 2 ];
then
echo -e "$Usage $0 URL\n"
exit -1;
fi
first_time=0
# Not first time
if [ ! -e "last.html" ];
then
first_time=1
# Set it is first time run
fi
curl --silent $1 -o recent.html
if [ $first_time -ne 1 ];
then
changes=$(diff -u last.html recent.html)
if [ -n "$changes" ];
then
echo -e "Changes:\n"
echo "$changes"
else
echo -e "\nWebsite has no changes"
fi
else
echo "[First run] Archiving.."
fi
cp recent.html last.html
www.it-ebooks.info
Tangled Web? Not At All!
202
Let's look at the output of the  track_changes.sh script when changes are made to the web
page and when the changes are not made to the page:
f First run:
$ ./track_changes.sh http://web.sarathlakshman.info/test.html
[First run] Archiving..
f Second Run:
$ ./track_changes.sh http://web.sarathlakshman.info/test.html
Website has no changes
f Third run after making changes to the web page:
$ ./test.sh http://web.sarathlakshman.info/test_change/test.html
Changes:
--- last.html 2010-08-01 07:29:15.000000000 +0200
+++ recent.html  2010-08-01 07:29:43.000000000 +0200
@@ -1,3 +1,4 @@
<html>
+added line :)
<p>data</p>
</html>
How it works...
The script checks whether the script is running for the first time using  [ ! -e "last.html"
]; . If  last.html doesn't exist, that means it is the first time and hence the webpage must
be downloaded and copied as  last.html .
If it is not the first time, it should download the new copy ( recent.html ) and check the
difference using the  diff utility. If changes are there, it should print the changes and finally it
should copy  recent.html to  last.html .
See also
f A primer on cURL, explains the curl command
www.it-ebooks.info
Chapter 5
203
Posting to a web page and reading response
POST and GET are two types of requests in HTTP to send information to or retrieve information
from a website. In a GET request, we send parameters (name-value pairs) through the web
page URL itself. In the case of POST, it won't be attached with the URL. POST is used when a
form needs to be submitted. For example, a username, the password to be submitted, and the
login page to be retrieved.
POSTing to pages comes as frequent use while writing scripts based on web page retrievals.
Let's see how to work with POST. Automating the HTTP GET and POST request by sending
POST data and retrieving output is a very important task that we practice while writing shell
scripts that parse data from websites.
Getting ready
Both cURL and  wget can handle POST requests by arguments. They are to be passed as
name-value pairs.
How to do it...
Let's see how to POST and read HTML response from a real website using  curl :
$ curl URL -d "postvar=postdata2&postvar2=postdata2"
We have a website ( http://book.sarathlakshman.com/lsc/mlogs/ ) and it is used
to submit the current user information such as hostname and username. Assume that, in
the home page of the website there are two fields HOSTNAME and USER, and a SUBMIT
button. When the user enters a hostname, a user name, and clicks on the SUBMIT button,
the details will be stored in the website. This process can be automated using a single line of
curl command by automating the POST request. If you look at the website source (use the
view source option from the web browser), you can see an HTML form defined similar to the
following code:
<form action="http://book.sarathlakshman.com/lsc/mlogs/submit.php"
method="post" >
<input type="text" name="host" value="HOSTNAME" >
<input type="text" name="user" value="USER" >
<input type="submit" >
</form>
Here,  http://book.sarathlakshman.com/lsc/mlogs/submit.php is the target
URL. When the user enters the details and clicks on the Submit button. The host and user
inputs are sent to  submit.php as a POST request and the response page is returned on the
browser.
www.it-ebooks.info
Tangled Web? Not At All!
204
We can automate the POST request as follows:
$ curl http://book.sarathlakshman.com/lsc/mlogs/submit.php -d "host=test-
host&user=slynux"
<html>
You have entered :
<p>HOST : test-host</p>
<p>USER : slynux</p>
<html>
Now  curl returns the response page.
-d is the argument used for posting. The string argument for  -d is similar to the GET request
semantics.  var=value pairs are to be delimited by  & .
The -d argument should always be given in quotes. If quotes are not used, &
is interpreted by the shell to indicate this should be a background process.
There's more
Let's see how to perform POST using cURL and  wget .
POST in curl
You can POST data in  curl by using  -d or  –data as follows:
$ curl –-data "name=value" URL -o output.html
If multiple variables are to be sent, delimit them with  & . Note that when  & is used the
name-value pairs should be enclosed in quotes, else the shell will consider  & as a special
character for background process. For example:
$ curl -d "name1=val1&name2=val2" URL -o output.html
POST data using wget
You can POST data using  wget by using  -–post-data "string" . For example:
$ wget URL –post-data "name=value" -O output.html
Use the same format as cURL for name-value pairs.
See also
f A primer on cURL, explains the curl command
f Downloading from a web page explains the wget command
www.it-ebooks.info
6
The Backup Plan
In this chapter, we will cover:
f Archiving with tar
f Archiving with cpio
f Compressing with gunzip (gzip)
f Compressing with bunzip (bzip)
f Compressing with lzma
f Archiving and compressing with zip
f Heavy compression squashfs fileystem
f Encrypting files and folders (with standard algorithms)
f Backup snapshots with rsync
f Version controlled backups with git
f Cloning disks with dd
Introduction
Taking snapshots and backups of data are regular tasks we come across. When it comes
to a server or large data storage systems, regular backups are important. It is possible
to automate backups via shell scripting. Archiving and compression seems to find usage
in the everyday life of a system admin or a regular user. There are various compression
formats that can be used in various ways so that best results can be obtained. Encryption is
another task that comes under frequent usage for protection of data. In order to reduce the
size of encrypted data, usually files are archived and compressed before encrypting. Many
standard encryption algorithms are available and it can be handled with shell utilities. This
chapter walks through different recipes for creating and maintaining files or folder archives,
compression formats, and encrypting techniques with shell. Let's go through the recipes.
www.it-ebooks.info
The Backup Plan
206
Archiving with tar
The  tar command can be used to archive files. It was originally designed for storing data on
tape archives (tar). It allows you to store multiple files and directories as a single file. It can
retain all the file attributes, such as owner, permissions, and so on. The file created by the  tar
command is often referred to as a tarball.
Getting ready
The  tar command comes by default with all UNIX like operating systems. It has a simple
syntax and is a portable file format. Let's see how to do it.
tar has got a list of arguments:  A ,  c ,  d ,  r ,  t ,  u ,  x ,  f , and  v . Each of these letters can be used
independently for different purposes corresponding to it.
How to do it...
To archive files with tar, use the following syntax:
$ tar -cf output.tar [SOURCES]
For example:
$ tar -cf output.tar file1 file2 file3 folder1 ..
In this command,  -c stands for "create file" and  –f stands for "specify filename".
We can specify folders and filenames as  SOURCES . We can use a list of file names or
wildcards such as  *.txt to specify the sources.
It will archive the source files into a file called  output.tar .
The filename must appear immediately after the  –f and should be the last option in the
argument group (for example,  -cvvf filename.tar and  -tvvf filename.tar ).
We cannot pass hundreds of files or folders as command-line arguments because there is a
limit. So it is safer to use the append option if many files are to be archived.
There's more...
Let's go through additional features that are available with the  tar command.
Appending files to an archive
Sometimes we may need to add files to an archive that already exists (an example usage is
when thousands of files are to be archived and when they cannot be specified in one line as
command-line arguments).
www.it-ebooks.info
Chapter 6
207
Append option:  -r
In order to append a file into an already existing archive use:
$ tar -rvf original.tar new_file
List the files in an archive as follows:
$ tar -tf archive.tar
yy/lib64/
yy/lib64/libfakeroot/
yy/sbin/
In order to print more details while archiving or listing, use the  -v  or the  –vv flag. These flags
are called verbose ( v ), which will enable to print more details on the terminal. For example,
by using verbose you could print more details, such as the file permissions, owner group,
modification date, and so on.
For example:
$ tar -tvvf archive.tar
drwxr-xr-x slynux/slynux 0 2010-08-06 09:31 yy/
drwxr-xr-x slynux/slynux 0 2010-08-06 09:39 yy/usr/
drwxr-xr-x slynux/slynux 0 2010-08-06 09:31 yy/usr/lib64/
Extracting files and folders from an archive
The following command extracts the contents of the archive to the current directory:
$ tar -xf archive.tar
The  -x  option stands for extract.
When  –x is used, the  tar command extracts the contents of the archive to the current
directory. We can also specify the directory where the files need to be extracted by using the
–C flag, as follows:
$ tar -xf archive.tar -C /path/to/extraction_directory
The command extracts the contents of an archive to insert image a specified directory. It
extracts the entire contents of the archive. We can also extract only a few files by specifying
them as command arguments:
$ tar -xvf file.tar file1 file4
The command above extracts only  file1 and  file4 , and ignores other files in the archive.
www.it-ebooks.info
The Backup Plan
208
stdin and stdout with tar
While archiving, we can specify  stdout as the output file so that another command appearing
through a pipe can read it as  stdin and then do some process or extract the archive.
This is helpful in order to transfer data through a Secure Shell (SSH) connection (while on a
network). For example:
$ mkdir ~/destination
$ tar -cf - file1 file2 file3 | tar -xvf - -C ~/destination
In the example above,  file1 ,  file2 , and  file3 are combined into a tarball and then
extracted to  ~/destination . In this command:
f -f specifies  stdout as the file for archiving (when the  -c option used)
f -f specifies  stdin as the file for extracting (when the  -x option used)
Concatenating two archives
We can easily merge multiple tar files with the  -A option.
Let's pretend we have two tarballs:  file1.tar and  file2.tar . We can merge the contents
of  file2.tar to  file1.tar as follows:
$ tar -Af file1.tar file2.tar
Verify it by listing the contents:
$ tar -tvf file1.tar
Updating files in an archive with timestamp check
The append option appends any given file to the archive. If the same file is inside the archive
is given to append, it will append that file and the archive will contain duplicates. We can
use the update option  -u to specify only append files that are newer than the file inside the
archive with the same name.
$ tar -tf archive.tar
filea
fileb
filec
This command lists the files in the archive.
In order to append  filea only if  filea has newer modification time than  filea inside
archive.tar , use:
$ tar -uvvf archive.tar filea
www.it-ebooks.info
Chapter 6
209
Nothing happens if the version of  filea outside the archive and the  filea inside
archive.tar have the same timestamp.
Use the  touch command to modify the file timestamp and then try the  tar command again:
$ tar -uvvf archive.tar filea
-rw-r--r-- slynux/slynux 0 2010-08-14 17:53 filea
The file is appended since its timestamp is newer than the one inside the archive.
Comparing files in archive and file system
Sometimes it is useful to know whether a file in the archive and a file with the same filename
in the filesystem are the same or contain any differences. The  –d flag can be used to print the
differences:
$ tar -df archive.tar filename1 filename2 ...
For example:
$ tar -df archive.tar afile bfile
afile: Mod time differs
afile: Size differs
Deleting files from archive
We can remove files from a given archive using the  –delete option. For example:
$ tar -f archive.tar --delete file1 file2 ..
Let's see another example:
$ tar -tf archive.tar
filea
fileb
filec
Or, we can also use the following syntax:
$ tar --delete --file archive.tar [FILE LIST]
For example:
$ tar --delete --file archive.tar filea
$ tar -tf archive.tar
fileb
filec
www.it-ebooks.info
The Backup Plan
210
Compression with tar archive
The  tar command only archives files, it does not compress them. For this reason, most people
usually add some form of compression when working with tarballs. This significantly decreases
the size of the files. Tarballs are often compressed into one of the following formats:
f file.tar.gz
f file.tar.bz2
f file.tar.lzma
f file.tar.lzo
Different  tar flags are used to specify different compression formats.
f -j for bunzip2
f -z for gzip
f --lzma for lzma
They are explained in the following compression-specific recipes.
It is possible to use compression formats without explicitly specifying special options as
above.  tar can compress by looking at the given extension of the output or input file names.
In order for  tar to support compression automatically by looking at the extensions, use  -a or
--auto-compress with  tar .
Excluding a set of files from archiving
It is possible to exclude a set of files from archiving by specifying patterns. Use
--exclude [PATTERN] for excluding files matched by wildcard patterns.
For example, to exclude all  .txt files from archiving use:
$ tar -cf arch.tar * --exclude "*.txt"
Note that the pattern should be enclosed in double quotes.
It is also possible to exclude a list of files provided in a list file with the  -X flag as follows:
$ cat list
filea
fileb
$ tar -cf arch.tar * -X list
Now it excludes  filea and  fileb from archiving.
www.it-ebooks.info
Chapter 6
211
Excluding version control directories
We usually use tarballs for distributing source code. Most of the source code is maintained
using version control systems such as subversion, Git, mercurial, cvs, and so on. Code
directories under version control will contain special directories used to manage versions like
.svn or  .git . However, these directories aren't needed by the code itself and so should be
eliminated from the tarball of the source code.
In order to exclude version control related files and directories while archiving use the
--exclude-vcs option along with  tar . For example:
$ tar --exclude-vcs -czvvf source_code.tar.gz eye_of_gnome_svn
Printing total bytes
It is sometimes useful if we can print total bytes copied to the archive. Print the total bytes
copied after archiving by using the -- totals option as follows:
$ tar -cf arc.tar * --exclude "*.txt" --totals
Total bytes written: 20480 (20KiB, 12MiB/s)
See also
f Compressing with gunzip (gzip), explains the gzip command
f Compressing with bunzip (bzip2), explains the bzip2 command
f Compressing with lzma, explains the lzma command
Archiving with cpio
cpio is another archiving format similar to  tar . It is used to store files and directories in a file
with attributes such as permissions, ownership, and so on. But it is not commonly used as
much as  tar . However,  cpio seems to be used in RPM package archives, initramfs files for
the Linux kernel, and so on. This recipe will give minimal usage examples of  cpio .
How to do it...
cpio takes input filenames through  stdin and it writes the archive into  stdout . We have to
redirect  stdout to a file to receive the output  cpio file as follows:
Create test files:
$ touch file1 file2 file3
We can archive the test files as follows:
$ echo file1 file2 file3 | cpio -ov > archive.cpio
www.it-ebooks.info
The Backup Plan
212
In this command:
f -o  specifies the output
f -v  is used for printing a list of files archived
By using cpio, we can also archive using files as absolute paths. /usr/
somedir is an absolute path as it contains the full path starting from root (/).
A relative path will not start with / but it starts the path from the current
directory. For example, test/file means that there is a directory test and
the file is inside the test directory.
While extracting, cpio extracts to the absolute path itself. But incase of tar it
removes the / in the absolute path and converts it as relative path.
In order to list files in a  cpio archive use the following command:
$ cpio -it < archive.cpio
This command will list all the files in the given  cpio archive. It reads the files from  stdin .
In this command:
f -i is for specifying the input
f -t is for listing
In order to extract files from the  cpio archive use:
$ cpio -id < archive.cpio
Here,  -d is used for extracting.
It overwrites files without prompting. If the absolute path files are present in the archive, it will
replace the files at that path. It will not extract files in the current directory like  tar .
Compressing with gunzip (gzip)
gzip is a commonly used compression format in GNU/Linux platforms. Utilities such as  gzip ,
gunzip , and  zcat are available to handle gzip compression file types.  gzip can be applied
on a file only. It cannot archive directories and multiple files. Hence we use a  tar archive
and compress it with  gzip . When multiple files are given as input it will produce several
individually compressed ( .gz ) files. Let's see how to operate with  gzip .
How to do it...
In order to compress a file with  gzip use the following command:
$ gzip filename
www.it-ebooks.info
Chapter 6
213
$ ls
filename.gz
Then it will remove the file and produce a compressed file called  filename.gz .
Extract a  gzip compressed file as follows:
$ gunzip filename.gz
It will remove  filename.gz and produce an uncompressed version of  filename.gz .
In order to list out the properties of a compressed file use:
$ gzip -l test.txt.gz
compressed uncompressed ratio uncompressed_name
35 6 -33.3% test.txt
The  gzip  command can read a file from  stdin and also write a compressed file into
stdout .
Read from  stdin and out as  stdout as follows:
$ cat file | gzip -c > file.gz
The  -c option is used to specify output to  stdout .
We can specify the compression level for  gzip . Use  --fast or the  --best option to provide
low and high compression ratios, respectively.
There's more...
The  gzip command is often used with other commands. It also has advanced options to
specify the compression ratio. Let's see how to work with these features.
Gzip with tarball
We usually use  gzip with tarballs. A tarball can be compressed by using the  –z option passed
to the  tar command while archiving and extracting.
You can create gzipped tarballs using the following methods:
f Method - 1
$ tar -czvvf archive.tar.gz [FILES]
Or:
$ tar -cavvf archive.tar.gz [FILES]
The  -a option specifies that the compression format should automatically be
detected from the extension.
www.it-ebooks.info
The Backup Plan
214
f Method - 2
First, create a tarball:
$ tar -cvvf archive.tar [FILES]
Compress it after tarballing as follows:
$ gzip archive.tar
If many files (a few hundreds) are to be archived in a tarball and need to be compressed, we
use Method - 2 with few changes. The issue with giving many files as command arguments
to  tar is that it can accept only a limited number of files from the command line. In order
to solve this issue, we can create a  tar file by adding files one by one using a loop with an
append option ( -r ) as follows:
FILE_LIST="file1 file2 file3 file4 file5"
for f in $FILE_LIST;
do
tar -rvf archive.tar $f
done
gzip archive.tar
In order to extract a gzipped tarball, use the following:
f -x for extraction
f -z for gzip specification
Or:
$ tar -xavvf archive.tar.gz -C extract_directory
In the above command, the  -a option is used to detect the compression format automatically.
zcat – reading gzipped files without extracting
zcat is a command that can be used to dump an extracted file from a  .gz  file to  stdout
without manually extracting it. The  .gz file remains as before but it will dump the extracted
file into  stdout as follows:
$ ls
test.gz
$ zcat test.gz
A test file
# file test contains a line "A test file"
$ ls
test.gz
www.it-ebooks.info
Chapter 6
215
Compression ratio
We can specify compression ratio, which is available in range 1 to 9, where:
f 1 is the lowest, but fastest
f 9 is the best, but slowest
You can also specify the ratios in between as follows:
$ gzip -9 test.img
This will compress the file to the maximum.
See also
f Archiving with tar, explains the tar command
Compressing with bunzip (bzip)
bunzip2 is another compression technique which is very similar to  gzip .  bzip2 typically
produces smaller (more compressed) files than  gzip . It comes with all Linux distributions.
Let's see how to use  bzip2 .
How to do it...
In order to compress with  bzip2 use:
$ bzip2 filename
$ ls
filename.bz2
Then it will remove the file and produce a compressed file called  filename.bzip2 .
Extract a bzipped file as follows:
$ bunzip2 filename.bz2
It will remove  filename.bz2 and produce an uncompressed version of  filename .
bzip2 can read a file from  stdin and also write a compressed file into  stdout .
In order to read from  stdin and read out as  stdout use:
$ cat file | bzip2 -c > file.tar.bz2
-c  is used to specify output to  stdout .
www.it-ebooks.info
The Backup Plan
216
We usually use  bzip2 with tarballs. A tarball can be compressed by using the  -j option
passed to the  tar command while archiving and extracting.
Creating a bzipped tarball can be done by using the following methods:
f Method - 1
$ tar -cjvvf archive.tar.bz2 [FILES]
Or:
$ tar -cavvf archive.tar.bz2 [FILES]
The  -a option specifies to automatically detect compression format from the extension.
f Method - 2
First create the tarball:
$ tar -cvvf archive.tar [FILES]
Compress it after tarballing:
$ bzip2 archive.tar
If we need to add hundreds of files to the archive, the above commands may fail. To fix that
issue, use a loop to append files to the archive one by one using the  –r option. See the similar
section from the recipe, Compressing with gunzip (gzip).
Extract a bzipped tarball as follows:
$ tar -xjvvf archive.tar.bz2 -C extract_directory
In this command:
f -x is used for extraction
f -j is for  bzip2 specification
f -C is for specifying the directory to which the files are to be extracted
Or, you can use the following command:
$ tar -xavvf archive.tar.bz2 -C extract_directory
-a will automatically detect the compression format.
There's more...
bunzip has several additional options to carry out different functions. Let's go through few
of them.
Keeping input files without removing them
While using  bzip2 or  bunzip2 , it will remove the input file and produce a compressed output
file. But we can prevent it from removing input files by using the  –k option.
www.it-ebooks.info
Chapter 6
217
For example:
$ bunzip2 test.bz2 -k
$ ls
test test.bz2
Compression ratio
We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the
least compression, but fast, and 9 is the highest possible compression but much slower).
For example:
$ bzip2 -9 test.img
This command provides maximum compression.
See also
f Archiving with tar, explains the tar command
Compressing with lzma
lzma is comparatively new when compared to  gzip or  bzip2 .  lzma offers better
compression rates than  gzip or  bzip2 . As  lzma is not preinstalled on most Linux distros,
you may need to install it using the package manager.
How to do it...
In order to compress with  lzma use the following command:
$ lzma filename
$ ls
filename.lzma
This will remove the file and produce a compressed file called  filename.lzma .
To extract an  lzma file use:
$ unlzma filename.lzma
This will remove  filename.lzma and produce an uncompressed version of the file.
The  lzma command can also read a file from  stdin and write the compressed file to  stdout .
www.it-ebooks.info
The Backup Plan
218
In order to read from  stdin and read out as  stdout use:
$ cat file | lzma -c > file.lzma
-c is used to specify output to  stdout .
We usually use  lzma with tarballs. A tarball can be compressed by using the  --lzma  option
passed to the  tar command while archiving and extracting.
There are two methods to create a  lzma tarball:
f Method - 1
$ tar -cvvf --lzma archive.tar.lzma [FILES]
Or:
$ tar -cavvf archive.tar.lzma [FILES]
The  -a option specifies to automatically detect the compression format from the
extension.
f Method - 2
First, create the tarball:
$ tar -cvvf archive.tar [FILES]
Compress it after tarballing:
$ lzma archive.tar
If we need to add hundreds of files to the archive, the above commands may fail. To fix that
issue, use a loop to append files to the archive one by one using the  –r  option. See the
similar section from the recipe, Compressing with gunzip (gzip).
There's more...
Let's go through additional options associated with  lzma utilities
Extracting an lzma tarball
In order to extract a tarball compressed with  lzma compression to a specified directory, use:
$ tar -xvvf --lzma archive.tar.lzma -C extract_directory
In this command,  -x  is used for extraction.  --lzma  specifies the use of  lzma to
decompress the resulting file.
Or, we could also use:
$ tar -xavvf archive.tar.lzma -C extract_directory
The  -a option specifies to automatically detect the compression format from the extension.
www.it-ebooks.info
Chapter 6
219
Keeping input files without removing them
While using  lzma or  unlzma , it will remove the input file and produce an output file. But we
can prevent from removing input files and keep them by using the  -k option. For example:
$ lzma test.bz2 -k
$ ls
test.bz2.lzma
Compression ratio
We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the
least compression, but fast, and 9 is the highest possible compression but much slower).
You can also specify ratios in between as follows:
$ lzma -9 test.img
This command compresses the file to the maximum.
See also
f Archiving with tar, explains the tar command
Archiving and compressing with zip
ZIP is a popular compression format used on many platforms. It isn't as commonly used as
gzip or  bzip2 on Linux platforms, but files from the Internet are often saved in this format.
How to do it...
In order to archive with ZIP, the following syntax is used:
$ zip archive_name.zip [SOURCE FILES/DIRS]
For example:
$ zip file.zip file
Here, the  file.zip file will be produced.
Archive directories and files recursively as follows:
$ zip -r archive.zip folder1 file2
In this command,  -r is used for specifying recursive.
www.it-ebooks.info
The Backup Plan
220
Unlike  lzma ,  gzip , or  bzip2 ,  zip won't remove the source file after archiving.  zip is similar
to  tar in that respect, but  zip can compress files where  tar does not. However,  zip adds
compression too.
In order to extract files and folders in a ZIP file, use:
$ unzip file.zip
It will extract the files without removing  filename.zip (unlike  unlzma or  gunzip ).
In order to update files in the archive with newer files in the filesystem, use the  -u flag:
$ zip file.zip -u newfile
Delete a file from a zipped archive, by using  –d as follows:
$ zip -d arc.zip file.txt
In order to list the files in an archive use:
$ unzip -l archive.zip
squashfs – the heavy compression filesystem
squashfs is a heavy-compression based read-only filesystem that is capable of compressing
2 to 3GB of data onto a 700 MB file. Have you ever thought of how Linux Live CDs work?
When a Live CD is booted it loads a complete Linux environment. Linux Live CDs make use
of a read-only compressed filesystem called squashfs. It keeps the root filesystem on a
compressed filesystem file. It can be loopback mounted and files can be accessed. Thus when
some files are required by processes, they are decompressed and loaded onto the RAM and
used. Knowledge of squashfs can be useful when building a custom live OS or when required
to keep files heavily compressed and to access them without entirely extracting the files.
For extracting a large compressed file, it will take a long time. However, if a file is loopback
mounted, it will be very fast since the required portion of the compressed files are only
decompressed when the request for files appear. In regular decompression, all the data is
decompressed first. Let's see how we can use squashfs.
Getting ready
If you have an Ubuntu CD just locate a  .squashfs file at  CDRom ROOT/casper/
filesystem.squashfs .  squashfs internally uses compression algorithms such as  gzip
and  lzma .  squashfs support is available in all of the latest Linux distros. However, in order
to create  squashfs files, an additional package squashfs-tools needs to be installed from
package manager.
www.it-ebooks.info
Chapter 6
221
How to do it...
In order to create a  squashfs file by adding source directories and files, use:
$ mksquashfs SOURCES compressedfs.squashfs
Sources can be wildcards, or file, or folder paths.
For example:
$ sudo mksquashfs /etc test.squashfs
Parallel mksquashfs: Using 2 processors
Creating 4.0 filesystem on test.squashfs, block size 131072.
[=======================================] 1867/1867 100%
More details will be printed on terminal. They are limited to save space
In order to mount the  squashfs file to a mount point, use loopback mounting as follows:
# mkdir /mnt/squash
# mount -o loop compressedfs.squashfs /mnt/squash
You can copy contents by accessing  /mnt/squashfs .
There's more...
The  squashfs file system can be created by specifying additional parameters. Let's go
through the additional options.
Excluding files while creating a squashfs file
While creating a  squashfs file, we can exclude a list of files or a file pattern specified using
wildcards.
Exclude a list of files specified as command-line arguments by using the  -e option. For
example:
$ sudo mksquashfs /etc test.squashfs -e /etc/passwd /etc/shadow
The  –e option is used to exclude  passwd and  shadow files.
It is also possible to specify a list of exclude files given in a file with  –ef as follows:
$ cat excludelist
/etc/passwd
/etc/shadow
$ sudo mksquashfs /etc test.squashfs -ef excludelist
If we want to support wildcards in excludes lists, use  -wildcard as an argument.
www.it-ebooks.info
The Backup Plan
222
Cryptographic tools and hashes
Encryption techniques are used mainly to protect data from unauthorized access. There are
many algorithms available and we use a common set of standard algorithms. There are a few
tools available in a Linux environment for performing encryption and decryption. Sometimes
we use encryption algorithm hashes for verifying data integrity. This section will introduce a few
commonly-used cryptographic tools and a general set of algorithms that these tools can handle.
How to do it...
Let's see how to use the tools such as crypt, gpg, base64, md5sum, sha1sum, and openssl:
f crypt
The  crypt command is a simple cryptographic utility, which takes a file from  stdin
and a passphrase as input and outputs encrypted data into  stdout .
$ crypt <input_file> output_file
Enter passphrase:
It will interactively ask for a passphrase. We can also provide a passphrase through
command-line arguments.
$ crypt PASSPHRASE < input_file > encrypted_file
In order to decrypt the file use:
$ crypt PASSPHRASE -d < encrypted_file > output_file
f gpg (GNU privacy guard)
gpg (GNU privacy guard) is a widely-used encryption scheme used for protecting files
with key signing techniques that enables to access data by authentic destination only.
gpg signatures are very famous. The details of gpg are outside the scope of this book.
Here we can learn how to encrypt and decrypt a file.
In order to encrypt a file with  gpg use:
$ gpg -c filename
This command reads the passphrase interactively and generates  filename.gpg .
In order to decrypt a  gpg file use:
$ gpg filename.gpg
This command reads a passphrase and decrypts the file.
f Base64
Base64 is a group of similar encoding schemes that represents binary data in an
ASCII string format by translating it into a radix-64 representation. The  base64
command can be used to encode and decode the Base64 string.
www.it-ebooks.info
Chapter 6
223
In order to encode a binary file into Base64 format, use:
$ base64 filename > outputfile
Or:
$ cat file | base64 > outputfile
It can read from  stdin .
Decode Base64 data as follows:
$ base64 -d file > outputfile
Or:
$ cat base64_file | base64 -d > outputfile
f md5sum and sha1sum
md5sum and sha1sum are unidirectional hash algorithms, which cannot be reversed
to form the original data. These are usually used to verify the integrity of data or for
generating a unique key from a given data. For every file it generates a unique key by
analyzing its content.
$ md5sum file
8503063d5488c3080d4800ff50850dc9 file
$ sha1sum file
1ba02b66e2e557fede8f61b7df282cd0a27b816b file
These types of hashes are ideal for storing passwords. Passwords are stored as its
hashes. When a user wants to authenticate, the password is read and converted to
the hash. Then hash is compared to the one that is stored already. If they are same,
the password is authenticated and access is provided, else it is denied. Storing
original password strings is risky and poses a security risk of exposing the password.
f Shadowlike hash (salted hash)
Let's see how to generate shadow like salted hash for passwords.
The user passwords in Linux are stored as its hashes in the  /etc/shadow file. A
typical line in  /etc/shadow will look like this:
test:$6$fG4eWdUi$ohTKOlEUzNk77.4S8MrYe07NTRV4M3LrJnZP9p.qc1bR5c.
EcOruzPXfEu1uloBFUa18ENRH7F70zhodas3cR.:14790:0:99999:7:::
In this line  $6$fG4eWdUi$ohTKOlEUzNk77.4S8MrYe07NTRV4M3LrJnZP9p.
qc1bR5c.EcOruzPXfEu1uloBFUa18ENRH7F70zhodas3cR is the shadow hash
corresponding to its password.
In some situations, we may need to write critical administration scripts that may need
to edit passwords or add users manually using a shell script. In that case we have to
generate a shadow password string and write a similar line as above to the shadow
file. Let's see how to generate a shadow password using  openssl .
www.it-ebooks.info
The Backup Plan
224
Shadow passwords are usually salted passwords.  SALT is an extra string used to
obfuscate and make the encryption stronger. The salt consists of random bits that are
used as one of the inputs to a key derivation function that generates the salted hash
for the password.
For more details on salt, see the Wikipedia page  http://en.wikipedia.org/
wiki/Salt_(cryptography) .
$ openssl passwd -1 -salt SALT_STRING PASSWORD
$1$SALT_STRING$323VkWkSLHuhbt1zkSsUG.
Replace  SALT_STRING with a random string and  PASSWORD with the password you
want to use.
Backup snapshots with rsync
Backing up data is something that most sysadmins need to do regularly. We may need to
backup data in a web server or from remote locations.  rsync is a command that can be
used to synchronize files and directories from one location to another while minimizing data
transfer using file difference calculations and compression. The advantage of  rsync over the
cp command is that  rsync uses strong difference algorithms. Also, it supports data transfer
across networks. While making copies, it compares the files in the original and destination
locations and will only copy the files that are newer. It also supports compression, encryption,
and a lot more. Let's see how we can work with  rsync .
How to do it...
In order to copy a source directory to a destination (to create a mirror) use:
$ rsync -av source_path destination_path
In this command:
f -a stands for archiving
f -v (verbose) prints the details or progress on  stdout
The above command will recursively copy all the files from the source path to the destination
path. We can specify paths as remote or localhost paths.
It can be in the format  /home/slynux/data ,  slynux@192.168.0.6:/home/backups/
data , and so on.
/home/slynux/data  specifies the absolute path in the machine in which the  rsync
command is executed.  slynux@192.168.0.6:/home/backups/data specifies that the
path is /home/backups/data in the machine with IP address  192.168.0.6 and is logged
in as user  slynux .
www.it-ebooks.info
Chapter 6
225
In order to back up data to a remote server or host, use:
$ rsync -av source_dir username@host:PATH
To keep a mirror at the destination, run the same  rsync command scheduled at regular
intervals. It will copy only changed files to the destination.
Restore the data from remote host to  localhost as follows:
$ rsync -av username@host:PATH destination
The  rsync command uses SSH to connect to another remote machine. Provide the remote
machine address in the format  user@host , where user is the username and host is the IP
address or domain name attached to the remote machine.  PATH is the absolute path address
where the data needs to be copied.  rsync will ask for the user password as usual for SSH
logic. This can be automated (avoid user password probing) by using SSH keys.
Make sure that the OpenSSH is installed and running on the remote machine.
Compressing data while transferring through the network can significantly optimize the
speed of the transfer. We can use the  rsync option  –z to specify to compress data while
transferring through a network. For example:
$ rsync -avz source destination
For the PATH format, if we use / at the end of the source, rsync will copy
contents of that end directory specified in the source_path to the destination.
If / not at the end of the source, rsync will copy that end directory itself to the
destination.
For example, the following command copies the content of the  test directory:
$ rsync -av /home/test/ /home/backups
The following command copies the  test directory to the destination:
$ rsync -av /home/test /home/backups
If / is at the end of destination_path, rsync will copy the source to the
destination directory.
If / is not used at the end of the destination path, rsync will create a folder,
named similar to the source directory, at the end of the destination path and
copy the source into that directory.
For example:
$ rsync -av /home/test /home/backups/
www.it-ebooks.info
The Backup Plan
226
This command copies the source ( /home/test ) to an existing folder called  backups .
$ rsync -av /home/test /home/backups
This command copies the source ( /home/test ) to a directory named  backups by creating
that directory.
There's more...
The  rsync command has several additional functionalities that can be specified using its
command-line options. Let's go through them.
Excluding files while archiving with rsync
Some files need not be updated while archiving to a remote location. It is possible to tell  rsync
to exclude certain files from the current operation. Files can be excluded by two options:
--exclude PATTERN
We can specify a wildcard pattern of files to be excluded. For example:
$ rsync -avz /home/code/some_code /mnt/disk/backup/code --exclude "*.txt"
This command excludes  .txt files from backing up.
Or, we can specify a list of files to be excluded by providing a list file.
Use  --exclude-from FILEPATH .
Deleting non-existent files while updating rsync backup
We archive files as tarball and transfer the tarball to the remote backup location. When we
need to update the backup data, we create a TAR file again and transfer the file to the backup
location. By default,  rsync does not remove files from the destination if they no longer exist
at the source. In order to remove the files from the destination that do not exist at the source,
use the  rsync --delete option:
$ rsync -avz SOURCE DESTINATION --delete
Scheduling backups at intervals
You can create a cron job to schedule backups at regular intervals.
A sample is as follows:
$ crontab -e
Add the following line:
0 */10 * * * rsync -avz /home/code user@IP_ADDRESS:/home/backups
The above  crontab entry schedules the  rsync to be executed every 10 hours.
www.it-ebooks.info
Chapter 6
227
*/10 is the hour position of the  crontab syntax.  /10 specifies to execute the backup every
10 hours. If  */10 is written in the minutes position, it will execute every 10 minutes.
Have a look at the Scheduling with cron recipe in Chapter 9 to understand how to configure
crontab .
Version control based backup with Git
People use different strategies in backing up data. Differential backups are more efficient
than making copies of the entire source directory to a target the backup directory with the
version number using date or time of a day. It causes wastage of space. We only need to
copy the changes that occurred to files from the second time that the backups occur. This is
called incremental backups. We can manually create incremental backups using tools like
rsync . But restoring this sort of backup can be difficult. The best way to maintain and restore
changes is to use version control systems. They are very much used in software development
and maintenance of code, since coding frequently undergoes changes. Git (GNU it) is a very
famous and is the most efficient version control systems available. Let's use Git for backup
of regular files in non-programming context. Git can be installed by your distro's package
manager. It was written by Linus Torvalds.
Getting ready
Here is the problem statement:
We have a directory that contains several files and subdirectories. We need to keep track of
changes occurring to the directory contents and back them up. If data becomes corrupted or
goes missing, we must be able to restore a previous copy of that data. We need to backup the
data at regular intervals to a remote machine. We also need to take the backup at different
locations in the same machine (localhost). Let's see how to implement it using Git.
How to do it...
In the directory which is to be backed up use:
$ cd /home/data/source
Let it be the directory source to be tracked.
Set up and initiate the remote backup directory. In the remote machine, create the backup
destination directory:
$ mkdir -p /home/backups/backup.git
$ cd /home/backups/backup.git
$ git init --bare
www.it-ebooks.info
The Backup Plan
228
The following steps are to be performed in the source host machine:
1. Add user details to Git in the source host machine:
$ git config --global user.name "Sarath Lakshman"
#Set user name to "Sarath Lakshman"
$ git config --global user.email slynux@slynux.com
# Set email to slynux@slynux.com
Initiate the source directory to backup from the host machine. In the source directory in
the host machine whose files are to be backed up, execute the following commands:
$ git init
Initialized empty Git repository in /home/backups/backup.git/
# Initialize git repository
$ git commit --allow-empty -am "Init"
[master (root-commit) b595488] Init
2. In the source directory, execute the following command to add the remote git
directory and synchronize backup:
$ git remote add origin user@remotehost:/home/backups/backup.git
$ git push origin master
Counting objects: 2, done.
Writing objects: 100% (2/2), 153 bytes, done.
Total 2 (delta 0), reused 0 (delta 0)
To user@remotehost:/home/backups/backup.git
* [new branch] master -> master
3. Add or remove files for Git tracking.
The following command adds all files and folders in the current directory to the
backup list:
$ git add *
We can conditionally add certain files only to the backup list as follows:
$ git add *.txt
$ git add *.py
We can remove the files and folders not required to be tracked by using:
$ git rm file
It can be a folder or even a wildcard as follows:
$ git rm *.txt
www.it-ebooks.info
Chapter 6
229
4. Check-pointing or marking backup points.
We can mark checkpoints for the backup with a message using the following
command:
$ git commit -m "Commit Message"
We need to update the backup at the remote location at regular intervals. Hence, set
up a cron job (for example, backing up every five hours).
Create a file crontab entry with lines:
0 */5 * * * /home/data/backup.sh
Create a script  /home/data/backup.sh as follows:
#!/bin/ bash
cd /home/data/source
git add .
git commit -am "Commit - @ $(date)"
git push
Now we have set up the backup system.
5. Restoring data with Git.
In order to view all backup versions use:
$ git log
Update the current directory to the last backup by ignoring any recent changes.
 To revert back to any previous state or version, look into the commit ID,
which is a 32-character hex string. Use the commit ID with  git checkout .
 For commit ID 3131f9661ec1739f72c213ec5769bc0abefa85a9 it will be:
$ git checkout 3131f9661ec1739f72c213ec5769bc0abefa85a9
$ git commit -am "Restore @ $(date) commit ID:
3131f9661ec1739f72c213ec5769bc0abefa85a9"
$ git push
 In order to view the details about versions again, use:
$ git log
If the working directory is broken due to some issues, we need to fix the directory with
the backup at the remote location.
Then we can recreate the contents from the backup at the remote location as follows:
$ git clone user@remotehost:/home/backups/backup.git
This will create a directory backup with all contents.
www.it-ebooks.info
The Backup Plan
230
Cloning hard drive and disks with dd
While working with hard drives and partitions, we may need to create copies or make backups
of full partitions rather than copying all contents (not only hard disk partitions but also copy an
entire hard disk without missing any information, such as boot record, partition table, and so
on). In this situation we can use the  dd command. It can be used to clone any type of disks,
such as hard disks, flash drives, CDs, DVDs, floppy disks, and so on.
Getting ready
The  dd  command expands to Data Definition. Since its improper usage leads to loss of data,
it is nicknamed as "Data Destroyer". Be careful while using the order of arguments. Wrong
arguments can lead to loss of entire data or can become useless.  dd is basically a bitstream
duplicator that writes the entire bit stream from a disk to a file or a file to a disk. Let's see how
to use  dd .
How to do it...
The syntax for  dd is as follows:
$ dd if=SOURCE of=TARGET bs=BLOCK_SIZE count=COUNT
In this command:
f if stands for input file or input device path
f of stands for target file or target device path
f bs stands for block size (usually, it is given in the power of 2, for example, 512, 1024,
2048, and so on).  COUNT is the number of blocks to be copied (an integer).
Total bytes copied = BLOCK_SIZE * COUNT
bs and  count are optional.
By specifying  COUNT we can limit the number of bytes to be copied from input file to target. If
COUNT is not specified,  dd will copy from input file until it reaches the end of file (EOF) marker.
In order to copy a partition into a file use:
# dd if=/dev/sda1 of=sda1_partition.img
Here  /dev/sda1 is the device path for the partition.
Restore the partition using the backup as follows:
# dd if=sda1_partition.img of=/dev/sda1
You should be careful about the argument  if and  of . Improper usage may lead to data loss.
www.it-ebooks.info
Chapter 6
231
By changing the device path  /dev/sda1 to the appropriate device path, any disk can be
copied or restored.
In order to permanently delete all of the data in a partition, we can make  dd to write zeros into
the partition by using the following command:
# dd if=/dev/zero of=/dev/sda1
/dev/zero is a character device. It always returns infinite zero '\0' characters.
Clone one hard disk to another hard disk of the same size as follows:
# dd if=/dev/sda of=/dev/sdb
Here  /dev/sdb  is the second hard disk.
In order to take the image of a CD ROM (ISO file) use:
# dd if=/dev/cdrom of=cdrom.iso
There's more...
When a file system is created in a file which is generated using  dd , we can mount it to a
mount point. Let's see how to work with it.
Mounting image files
Any file image created using  dd can be mounted using the loopback method. Use the  -o
loop  with the  mount command.
# mkdir /mnt/mount_point
# mount -o loop file.img /mnt/mount_point
Now we can access the contents of the image files through the location  /mnt/mount_point .
See also
f Creating ISO files, Hybrid ISO of Chapter 3, explains how to use dd to create an ISO
file from a CD
www.it-ebooks.info
www.it-ebooks.info
7
The Old-boy Network
In this chapter, we will cover:
f Basic networking primer
f Let's ping!
f Listing all the machines alive on a network
f Transferring files through network
f Setting up an Ethernet and wireless LAN with script
f Password-less auto-login with SSH
f Running commands on remote host with SSH
f Mounting remote drive at local mount point
f Multi-casting window messages on a network
f Network traffic and port analysis
Introduction
Networking is the act of interconnecting machines through a network and configuring the
nodes in the network with different specifications. We use TCP/IP as our networking stack
and all operations are based on it. Networks are an important part of every computer system.
Each node connected in the network is assigned a unique IP address for identification. There
are many parameters in networking, such as subnet mask, route, ports, DNS, and so on,
which require a basic understanding to follow.
www.it-ebooks.info
The Old-boy Network
234
Several applications that make use of a network operate by opening and connecting to
firewall ports. Every application may offer services such as data transfer, remote shell login,
and so on. Several interesting management tasks can be performed on a network consisting
of many machines. Shell scripts can be used to configure the nodes in a network, test the
availability of machines, automate execution of commands at remote hosts, and so on. This
chapter focuses on different recipes that introduce interesting tools or commands related to
networking and also how they can be used for solving different problems.
Basic networking primer
Before digging through recipes based on networking, it is essential for you to have a basic
understanding of setting up a network, the terminology and commands for assigning an IP
address, adding routes, and so on. This recipe will give an overview of different commands
used in GNU/Linux for networking and their usages from the basics.
Getting ready
Every node in a network requires many parameters to be assigned to work successfully and
interconnect with other machines. Some of the different parameters are the IP address,
subnet mask, gateway, route, DNS, and so on.
This recipe will introduce commands  ifconfig ,  route ,  nslookup , and  host .
How to do it...
Network interfaces are used to connect to a network. Usually, in the context of UNIX-like
Operating Systems, network interfaces follow the eth0, eth1 naming convention. Also, other
interfaces, such as usb0, wlan0, and so on, are available for USB network interfaces, wireless
LAN, and other such networks.
ifconfig is the command that is used to display details about network interfaces, subnet
mask, and so on.
ifconfig is available at  /sbin/ifconfig . Some GNU/Linux distributions will display an
error "command not found" when  ifconfig is typed. This is because  /sbin in not included
in the user's PATH environment variable. When a command is typed, the Bash looks in the
directories specified in PATH variable.
By default, in Debian,  ifconfig is not available since  /sbin is not in PATH.
/sbin/ifconfig is the absolute path, so try  run ifconfig with the absolute path (that is,
/sbin/ifconfig ). For every system, there will be a by default interface 'lo' called loopback
that points to the current machine. For example:
$ ifconfig
lo Link encap:Local Loopback
www.it-ebooks.info
Chapter 7
235
inet addr:127.0.0.1 Mask:255.0.0.0
inet6addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:6078 errors:0 dropped:0 overruns:0 frame:0
TX packets:6078 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:634520 (634.5 KB) TX bytes:634520 (634.5 KB)
wlan0 Link encap:EthernetHWaddr 00:1c:bf:87:25:d2
inet addr:192.168.0.82 Bcast:192.168.3.255 Mask:255.255.252.0
inet6addr: fe80::21c:bfff:fe87:25d2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:420917 errors:0 dropped:0 overruns:0 frame:0
TX packets:86820 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:98027420 (98.0 MB) TX bytes:22602672 (22.6 MB)
The left-most column in the  ifconfig output lists the name of network interfaces and the
right-hand columns show the details related to the corresponding network interface.
There's more...
There are several additional commands that frequently come under usage for querying and
configuring the network. Let's go through the essential commands and usage.
Printing the list of network interfaces
Here is a one-liner command sequence to print the list of network interface available
on a system.
$ ifconfig | cut -c-10 | tr -d ' ' | tr -s '\n'
lo
wlan0
The first 10 characters of each line in the  ifconfig output is reserved for writing the
name of the network interface. Hence we use  cut to extract the first 10 characters of each
line.  tr -d ' ' deletes every space character in each line. Now the  \n newline character is
squeezed using  tr -s '\n' to produce a list of interface names.
www.it-ebooks.info
The Old-boy Network
236
Assigning and displaying IP addresses
The  ifconfig command displays details of every network interface available on the system.
However, we can restrict it to a specific interface by using:
$ ifconfig iface_name
For example:
$ ifconfig wlan0
wlan0 Link encap:Ethernet HWaddr 00:1c:bf:87:25:d2
inet addr:192.168.0.82 Bcast:192.168.3.255
Mask:255.255.252.0
From the outputs of the previously mentioned command, our interests lie in the IP address,
broadcast address, hardware address, and subnet mask. They are as follows:
f HWaddr 00:1c:bf:87:25:d2 is the hardware address (MAC address)
f inet addr:192.168.0.82 is the IP address
f Bcast:192.168.3.255 is the broadcast address
f Mask:255.255.252.0 is the subnet mask
In several scripting contexts, we may need to extract any of these addresses from the script
for further manipulations.
Extracting the IP address is a common task. In order to extract the IP address from the
ifconfig output use:
$ ifconfig wlan0 | egrep -o "inet addr:[^ ]*" | grep -o "[0-9.]*"
192.168.0.82
Here the first command  egrep -o "inet addr:[^ ]*" will print  inet
addr:192.168.0.82 .
The pattern starts with  inet addr: and ends with some non-space character sequence
(specified by  [^ ]* ). Now in the next pipe, it prints the character combination of digits and '.'.
In order to set the IP address for a network interface, use:
# ifconfig wlan0 192.168.0.80
You will need to run the above command as root.  192.168.0.80 is the address to be set.
Set the subnet mask along with IP address as follows:
# ifconfig wlan0 192.168.0.80 netmask 255.255.252.0
www.it-ebooks.info
Chapter 7
237
Spoofing Hardware Address (MAC Address)
In certain circumstances where authentication or filtering of computers on a network is
provided by using the hardware address, we can use hardware address spoofing. The
hardware address appears in  ifconfig  output as  HWaddr 00:1c:bf:87:25:d2 .
We can spoof the hardware address at the software level as follows:
# ifconfig eth0 hw ether 00:1c:bf:87:25:d5
In the above command,  00:1c:bf:87:25:d5 is the new MAC address to be assigned.
This can be useful when we need to access the Internet through MAC authenticated service
providers that provide access to the Internet for a single machine.
Name server and DNS (Domain Name Service)
The elementary addressing scheme for the Internet is IP addresses (dotted decimal form, for
example,  202.11.32.75 ). However, the resources on the Internet (for example, websites)
are accessed through a combination of ASCII characters called URLs or domain names. For
example,  google.com is a domain name. It actually corresponds to an IP address. Typing the
IP address in the browser can also access the URL  www.google.com .
This technique of abstracting IP addresses with symbolic names is called Domain Name Service
(DNS). When we enter  google.com , the DNS servers configured with our network resolve the
domain name into the corresponding IP address. While on a local network, we setup the local
DNS for naming local machines on the network symbolically using their hostnames.
Name servers assigned to the current system can be viewed by reading  /etc/resolv.conf .
For example:
$ cat /etc/resolv.conf
nameserver 8.8.8.8
We can add name servers manually as follows:
# echo nameserver IP_ADDRESS >> /etc/resolv.conf
How can we obtain the IP address for a corresponding domain name?
The easiest method to obtain an IP address is by trying to ping the given domain name and
looking at the echo reply. For example:
$ ping google.com
PING google.com (64.233.181.106) 56(84) bytes of data.
Here 64.233.181.106 is the corresponding IP address.
A domain name can have multiple IP addresses assigned. In that case, the DNS server will
return one address among the list of IP addresses. To obtain all the addresses assigned to
the domain name, we should use a DNS lookup utility.
www.it-ebooks.info
The Old-boy Network
238
DNS lookup
There are different DNS lookup utilities available from the command line. These will request a
DNS server for an IP address resolution.  host and  nslookup are two DNS lookup utilities.
When  host is executed it will list out all of the IP addressed attached to the domain name.
nslookup is another command that is similar to  host , which can be used to query details
related to DNS and resolving of names. For example:
$ host google.com
google.com has address 64.233.181.105
google.com has address 64.233.181.99
google.com has address 64.233.181.147
google.com has address 64.233.181.106
google.com has address 64.233.181.103
google.com has address 64.233.181.104
It may also list out DNS resource records like MX (Mail Exchanger) as follows:
$ nslookup google.com
Server: 8.8.8.8
Address: 8.8.8.8#53
Non-authoritative answer:
Name: google.com
Address: 64.233.181.105
Name: google.com
Address: 64.233.181.99
Name: google.com
Address: 64.233.181.147
Name: google.com
Address: 64.233.181.106
Name: google.com
Address: 64.233.181.103
Name: google.com
Address: 64.233.181.104
Server: 8.8.8.8
The last line above corresponds to the default nameserver used for DNS resolution.
www.it-ebooks.info
Chapter 7
239
Without using the DNS server, it is possible to add a symbolic name to IP address resolution
just by adding entries into file  /etc/hosts .
In order to add an entry, use the following syntax:
# echo IP_ADDRESS symbolic_name >> /etc/hosts
For example:
# echo 192.168.0.9 backupserver.com >> /etc/hosts
After adding this entry, whenever a resolution to  backupserver.com occurs, it will resolve
to  192.168.0.9 .
Setting default gateway, showing routing table information
When a local network is connected to another network, it needs to assign some machine
or network node through which an interconnection takes place. Hence the IP packets with
a destination exterior to the local network should be forwarded to the node machine, which
is interconnected to the external network. This special node machine, which is capable of
forwarding packets to the external network, is called a gateway. We set the gateway for every
node to make it possible to connect to an external network.
The operating system maintains a table called the routing table, which contains information
on how packets are to be forwarded and through which machine node in the network. The
routing table can be displayed as follows:
$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref UseIface
192.168.0.0 * 255.255.252.0 U 2 0 0wlan0
link-local * 255.255.0.0 U 1000 0 0wlan0
default p4.local 0.0.0.0 UG 0 0 0wlan0
Or, you can also use:
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.0.0 0.0.0.0 255.255.252.0 U 2 0 0 wlan0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 wlan0
0.0.0.0 192.168.0.4 0.0.0.0 UG 0 0 0 wlan0
Using  -n specifies to display the numerical addresses. When  -n is used it will display every
entry with a numerical IP addresses, else it will show symbolic host names instead of IP
addresses under the DNS entries for IP addresses that are available.
www.it-ebooks.info
The Old-boy Network
240
A default gateway is set as follows:
# route add default gw IP_ADDRESS INTERFACE_NAME
For example:
# route add default gw 192.168.0.1 wlan0
Traceroute
When an application requests a service through the Internet, the server may be at a distant
location and connected through any number of gateways or device nodes. The packets
travel through several gateways and reach the destination. There is an interesting command
traceroute that displays the address of all intermediate gateways through which the
packet travelled to reach the destination.  traceroute information helps us to understand
how many hops each packet should take in order reach the destination. The number of
intermediate gateways or routers gives a metric to measure the distance between two nodes
connected in a large network. An example of the output from  traceroute is as follows:
$ traceroute google.com
traceroute to google.com (74.125.77.104), 30 hops max, 60 byte packets
1 gw-c6509.lxb.as5577.net (195.26.4.1) 0.313 ms 0.371 ms 0.457 ms
2 40g.lxb-fra.as5577.net (83.243.12.2) 4.684 ms 4.754 ms 4.823 ms
3 de-cix10.net.google.com (80.81.192.108) 5.312 ms 5.348 ms 5.327 ms
4 209.85.255.170 (209.85.255.170) 5.816 ms 5.791 ms 209.85.255.172
(209.85.255.172) 5.678 ms
5 209.85.250.140 (209.85.250.140) 10.126 ms 9.867 ms 10.754 ms
6 64.233.175.246 (64.233.175.246) 12.940 ms 72.14.233.114
(72.14.233.114) 13.736 ms 13.803 ms
7 72.14.239.199 (72.14.239.199) 14.618 ms 209.85.255.166
(209.85.255.166) 12.755 ms 209.85.255.143 (209.85.255.143) 13.803 ms
8 209.85.255.98 (209.85.255.98) 22.625 ms 209.85.255.110
(209.85.255.110) 14.122 ms
*
9 ew-in-f104.1e100.net (74.125.77.104) 13.061 ms 13.256 ms 13.484 ms
See also
f Playing with variables and environment variables of Chapter 1, explains the PATH
variable
f Searching and mining "text" inside a file with grep of Chapter 4, explains the grep
command
www.it-ebooks.info
Chapter 7
241
Let's ping!
ping is the most basic network command, and one that every user should first know. It is a
universal command that is available on major Operating Systems. It is also a diagnostic tool
used for verifying the connectivity between two hosts on a network. It can be used to find out
which machines are alive on a network. Let us see how to use ping.
How to do it...
In order to check the connectivity of two hosts on a network, the  ping command uses
Internet Control Message Protocol (ICMP) echo packets. When these echo packets are sent
towards a host, the host responds back with a reply if it is reachable or alive.
Check whether a host is reachable as follows:
$ ping ADDRESS
The  ADDRESS can be a hostname, domain name, or an IP address itself.
ping will continuously send packets and the reply information is printed on the terminal. Stop
the pinging by pressing  Ctrl +  C .
For example:
f When the host is reachable the output will be similar to the following:
$ ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=1.44 ms
^C
--- 192.168.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.440/1.440/1.440/0.000 ms
$ ping google.com
PING google.com (209.85.153.104) 56(84) bytes of data.
64 bytes from bom01s01-in-f104.1e100.net (209.85.153.104): icmp_
seq=1 ttl=53 time=123 ms
^C
--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 123.388/123.388/123.388/0.000 ms
www.it-ebooks.info
The Old-boy Network
242
f When a host is unreachable the output will be similar to:
$ ping 192.168.0.99
PING 192.168.0.99 (192.168.0.99) 56(84) bytes of data.
From 192.168.0.82 icmp_seq=1 Destination Host Unreachable
From 192.168.0.82 icmp_seq=2 Destination Host Unreachable
Once the host is not reachable, the ping returns a  Destination Host Unreachable
error message.
There's more
In addition to checking the connectivity between two points in a network, the  ping command
can be used with additional options to get useful information. Let's go through the additional
options of  ping .
Round trip time
The  ping command can be used to find out the Round Trip Time (RTT) between two hosts on a
network. RTT is the time required for the packet to reach the destination host and come back to
the source host. The RTT in milliseconds can be obtained from ping. An example is as follows:
--- google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 118.012/206.630/347.186/77.713 ms
Here the minimum RTT is 118.012ms, the average RTT is 206.630ms, and the maximum RTT is
347.186ms. The  mdev (77.713ms) parameter in the  ping output stands for mean deviation.
Limiting number of packets to be sent
The  ping command sends echo packets and waits for the reply of  echo indefinitely until it is
stopped by pressing  Ctrl +  C . However, we can limit the count of echo packets to be sent by
using the  -c flag.
The usage is as follows:
-c COUNT
For example:
$ ping 192.168.0.1 -c 2
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=4.02 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=1.03 ms
www.it-ebooks.info
Chapter 7
243
--- 192.168.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.039/2.533/4.028/1.495 ms
In the previous example, the  ping command sends two echo packets and stops.
This is useful when we need to ping multiple machines from a list of IP addresses through a
script and checks its statuses.
Return status of ping command
The  ping  command returns exit status 0 when it succeeds and returns non-zero when it
fails. Successful means, destination host is reachable, where failure is when destination host
is unreachable.
The return status can be easily obtained as follows:
$ ping ADDRESS -c2
if [ $? -eq 0 ];
then
echo Successful ;
else
echo Failure
fi
Listing all the machines alive on a network
When we deal with a large local area network, we may need to check the availability of other
machines in the network, whether alive or not. A machine may not be alive in two conditions:
either it is not powered on or due to a problem in the network. By using shell scripting, we can
easily find out and report which machines are alive on the network. Let's see how to do it.
Getting ready
In this recipe, we use two methods. The first method uses  ping and the second method uses
fping .  fping doesn't come with a Linux distribution by default. You may have to manually
install  fping using a package manager.
How to do it...
Let's go through the script to find out all the live machines on the network and alternate
methods to find out the same.
www.it-ebooks.info
The Old-boy Network
244
f Method 1:
We can write our own script using the  ping command to query list of IP addresses
and check whether they are alive or not as follows:
#!/bin/bash
#Filename: ping.sh
# Change base address 192.168.0 according to your network.
for ip in 192.168.0.{1..255} ;
do
ping $ip -c 2 &> /dev/null ;
if [ $? -eq 0 ];
then
echo $ip is alive
fi
done
The output is as follows:
$ ./ping.sh
192.168.0.1 is alive
192.168.0.90 is alive
f Method 2:
We can use an existing command-line utility to query the status of machines on a
network as follows:
$ fping -a 192.160.1/24 -g 2> /dev/null
192.168.0.1
192.168.0.90
Or, use:
$ fping -a 192.168.0.1 192.168.0.255 -g
How it works...
In Method 1, we used the  ping command to find out the alive machines on the network.
We used a  for loop for iterating through the list of IP addresses. The list is generated as
192.168.0.{1..255} . The  {start..end} notation will expand and will generate a list of
IP addresses, such as  192.168.0.1 ,  192.168.0.2 ,  192.168.0.3 till  192.168.0.255 .
www.it-ebooks.info
Chapter 7
245
ping $ip -c 2 &> /dev/null will run a  ping to the corresponding IP address in each
execution of loop.  -c 2 is used to restrict the number of echo packets to be sent to two
packets.  &> /dev/null is used to redirect both  stderr and  stdout to  /dev/null so that
it won't be printed on the terminal. Using  $? we evaluate the exit status. If it is successful, the
exit status is 0 else non-zero. Hence the successful IP addresses are printed. We can also
print the list of unsuccessful IP addresses to give the list of unreachable IP addresses.
Here is an exercise for you. Instead of using a range of IP
addresses hard-coded in the script, modify the script to
read a list of IP addresses from a file or stdin.
In this script, each ping is executed one after the other. Even though all the IP addresses
are independent each other, the  ping command is executed due to a sequential program, it
takes a delay of sending two echo packets and receiving them or the timeout for a reply for
executing the next  ping command.
When it comes to 255 addresses, the delay is large. Let's run all the  ping commands in
parallel to make it much faster. The core part of the script is the loop body. To make the  ping
commands in parallel, enclose the loop body in  ( )& .  ( )  encloses a block of commands
to run as the sub-shell and  & sends it to the background by leaving the current thread. For
example:
(
ping $ip -c2 &> /dev/null ;
if [ $? -eq 0 ];
then
echo $ip is alive
fi
)&
wait
The  for loop body executes many background process and it comes out of the loop and it
terminates the script. In order to present the script to terminate until all its child process end,
we have a command called  wait . Place a  wait at the end of the script so that it waits for the
time until all the child  ( ) subshell processes complete.
The wait command enables a script to be terminated only after all its child
process or background processes terminate or complete.
Have a look at fast_ping.sh from the code provided with the book.
www.it-ebooks.info
The Old-boy Network
246
Method 2 uses a different command called  fping . It can ping a list of IP addresses
simultaneously and respond very quickly. The options available with  fping are as follows:
f The  -a option with  fping specifies to print all alive machine's IP addresses
f The  -u option with  fping specifies to print all unreachable machines
f The  -g option specifies to generate a range of IP addresses from slash-subnet mask
notation specified as IP/mask or start and end IP addresses as:
$ fping -a 192.160.1/24 -g
Or
$ fping -a 192.160.1 192.168.0.255 -g
f 2>/dev/null is used to dump error messages printed due to unreachable host to a
null device
It is also possible to manually specify a list of IP addresses as command-line arguments or as
a list through  stdin . For example:
$ fping -a 192.168.0.1 192.168.0.5 192.168.0.6
# Passes IP address as arguments
$ fping -a <ip.list
# Passes a list of IP addresses from a file
There's more...
The  fping command can be used for querying DNS data from a network. Let's see how to do it.
DNS lookup with fping
fping has an option  -d that returns host names by using DNS lookup for each echo reply. It
will print out host names rather than IP addresses on ping replies.
$ cat ip.list
192.168.0.86
192.168.0.9
192.168.0.6
$ fping -a -d 2>/dev/null <ip.list
www.local
dnss.local
www.it-ebooks.info
Chapter 7
247
See also
f Playing with file descriptors and redirection of Chapter 1, explains the data
redirection
f Comparisons and tests of Chapter 1, explains numeric comparisons
Transferring files
The major purpose of the networking of computers is for resource sharing. Among resource
sharing, the most prominent use is in file sharing. There are different methods by which we
can transfer files between different nodes on a network. This recipe discusses how to transfer
files using commonly used protocols FTP, SFTP, RSYNC, and SCP.
Getting ready
The commands for performing file transfer over the network are mostly available by default
with Linux installations. Files via FTP can be transferred by using the  lftp command. Files via
a SSH connection can be transferred by using  sftp , RSYNC using SSH with  rsync command
and transfer through SSH using  scp .
How to do it...
File Transfer Protocol (FTP) is an old file transfer protocol for transferring files between
machines on a network. We can use the command  lftp for accessing FTP enabled servers
for file transfer. It uses Port 21. FTP can only be used if an FTP server is installed on the
remote machine. FTP is used by many public websites to share files.
To connect to an FTP server and transfer files in between, use:
$ lftp username@ftphost
Now it will prompt for a password and then display a logged in prompt as follows:
lftp username@ftphost:~>
You can type commands in this prompt. For example:
f To change to a directory, use  cd directory
f To change directory of local machine, use  lcd
f To create a directory use  mkdir
f To download a file, use  get filename as follows:
lftp username@ftphost:~> get filename
www.it-ebooks.info
The Old-boy Network
248
f To upload a file from the current directory, use  put filename as follows:
lftp username@ftphost:~> put filename
f An  lftp session can be exited by using the  quit command
Auto completion is supported in the  lftp prompt.
There's more...
Let's go through some additional techniques and commands used for file transfer through a
network.
Automated FTP transfer
ftp is another command used for FTP-based file transfer.  lftp is more flexible for usage.
lftp and the  ftp command open an interactive session with user (it prompts for user input
by displaying messages). What if we want to automate a file transfer instead of using the
interactive mode? We can automate FTP file transfers by writing a shell script as follows:
#!/bin/bash
#Filename: ftp.sh
#Automated FTP transfer
HOST='domain.com'
USER='foo'
PASSWD='password'
ftp -i -n $HOST <<EOF
user ${USER} ${PASSWD}
binary
cd /home/slynux
puttestfile.jpg
getserverfile.jpg
quit
EOF
The above script has the following structure:
<<EOF
DATA
EOF
This is used to send data through  stdin to the FTP command. The recipe, Playing with file
descriptors and redirection in Chapter 1, explains various methods for redirection into  stdin .
The  -i option of  ftp turns off the interactive session with user.  user ${USER} ${PASSWD}
sets the username and password.  binary sets the file mode to binary.
www.it-ebooks.info
Chapter 7
249
SFTP (Secure FTP)
SFTP is an FTP-like file transfer system that runs on top of an SSH connection. It makes use of
an SSH connection to emulate an FTP interface. It doesn't require an FTP server at the remote
end to perform file transfer but it requires an OpenSSH server to be installed and running. It is
an interactive command, which offers an  sftp prompt.
The following commands are used to perform the file transfer. All other commands remain
same for every automated FTP session with specific HOST, USER, and PASSWD:
cd /home/slynux
put testfile.jpg
get serverfile.jpg
In order to run  sftp , use:
$ sftp user@domainname
Similar to  lftp , an  sftp session can be exited by typing the  quit command.
The SSH server sometimes will not be running at the default Port 22. If it is running at a
different port, we can specify the port along with  sftp as  -oPort=PORTNO .
For example:
$ sftp -oPort=422 user@slynux.org
-oPort should be the first argument of the sftp command.
RSYNC
rsync is an important command-line utility that is widely used for copying files over networks
and for taking backup snapshots. This is better explained in separate recipe,
Backup snapshots with rsync, that explains the usage of  rsync .
SCP (Secure Copy)
SCP is a file copy technique which is more secure than the traditional remote copy tool called
rcp . The files are transferred through an encrypted channel. SSH is used as an encryption
channel. We can easily transfer files to a remote machine as follows:
$ scp filename user@remotehost:/home/path
This will prompt for a password. It can be made password less by using autologin SSH
technique. The recipe, Password-less auto-login with SSH, explains SSH autologin.
Therefore, file transfer using  scp doesn't require specific scripting. Once SSH login is automated,
the  scp  command can be executed without an interactive prompt for the password.
www.it-ebooks.info
The Old-boy Network
250
Here  remotehost can be IP address or domain name. The format of the  scp command is:
$ scp SOURCE DESTINATION
SOURCE or  DESTINATION can be in the format  username@localhost:/path for example:
$ scp user@remotehost:/home/path/filename filename
The above command copies a file from the remote host to the current directory with the given
filename.
If SSH is running at a different port than 22, use  -oPort with the same syntax as  sftp .
Recursive copying with SCP
By using  scp we can recursively copy a directory between two machines on a network as
follows with the  -r parameter:
$ scp -r /home/slynux user@remotehost:/home/backups
# Copies the directory /home/slynux recursively to remote location
scp can also copy files by preserving permissions and mode by using the  -p parameter.
See also
f Playing with file descriptors and redirection of Chapter 1, explains the standard input
using EOF
Setting up an Ethernet and wireless LAN
with script
An Ethernet is simple to configure. Since it uses physical cables, there are no special
requirements such as authentication. However, a wireless LAN requires authentication—for
example, a WEP key as well as the ESSID of the wireless network to connect. Let's see how to
connect to a wireless as well as a wired network by writing a shell script.
Getting ready
To connect to a wired network, we need to assign an IP address and subnet mask by using the
ifconfig utility. But for a wireless network connection, it will require additional utilities, such
as  iwconfig and  iwlist , to configure more parameters.
www.it-ebooks.info
Chapter 7
251
How to do it...
In order to connect to a network from a wired interface, execute the following script:
#!/bin/bash
#Filename: etherconnect.sh
#Description: Connect Ethernet
#Modify the parameters below according to your settings
######### PARAMETERS ###########
IFACE=eth0
IP_ADDR=192.168.0.5
SUBNET_MASK=255.255.255.0
GW=192.168.0.1
HW_ADDR='00:1c:bf:87:25:d2'
# HW_ADDR is optional
#################################
if [ $UID -ne 0 ];
then
echo "Run as root"
exit 1
fi
# Turn the interface down before setting new config
/sbin/ifconfig $IFACE down
if [[ -n $HW_ADDR ]];
then
/sbin/ifconfig hw ether $HW_ADDR
echo Spoofed MAC ADDRESS to $HW_ADDR
fi
/sbin/ifconfig $IFACE $IP_ADDR netmask $SUBNET_MASK
route add default gw $GW $IFACE
echo Successfully configured $IFACE
The script for connecting to a wireless LAN with WEP is as follows:
#!/bin/bash
#Filename: wlan_connect.sh
#Description: Connect to Wireless LAN
#Modify the parameters below according to your settings
######### PARAMETERS ###########
IFACE=wlan0
IP_ADDR=192.168.1.5
SUBNET_MASK=255.255.255.0
www.it-ebooks.info
The Old-boy Network
252
GW=192.168.1.1
HW_ADDR='00:1c:bf:87:25:d2'
#Comment above line if you don't want to spoof mac address
ESSID="homenet"
WEP_KEY=8b140b20e7
FREQ=2.462G
#################################
KEY_PART=""
if [[ -n $WEP_KEY ]];
then
KEY_PART="key $WEP_KEY"
fi
# Turn the interface down before setting new config
/sbin/ifconfig $IFACE down
if [ $UID -ne 0 ];
then
echo "Run as root"
exit 1;
fi
if [[ -n $HW_ADDR ]];
then
/sbin/ifconfig $IFACE hw ether $HW_ADDR
echo Spoofed MAC ADDRESS to $HW_ADDR
fi
/sbin/iwconfig $IFACE essid $ESSID $KEY_PART freq $FREQ
/sbin/ifconfig $IFACE $IP_ADDR netmask $SUBNET_MASK
route add default gw $GW $IFACE
echo Successfully configured $IFACE
How it works...
The commands  ifconfig ,  iwconfig , and  route are to be run as root. Hence a check for
the root user is performed at the beginning of the scripts.
The Ethernet connection script is pretty straightforward and it uses the concepts explained in
the recipe, Basic networking primer. Let's go through the commands used for connecting to
the wireless LAN.
www.it-ebooks.info
Chapter 7
253
A wireless LAN requires some parameters such as the  essid ,  key , and frequency to connect
to the network. The  essid is the name of the wireless network to which we need to connect.
Some Wired Equivalent Protocol (WEP) networks use a WEP key for authentication, whereas
some networks don't. The WEP key is usually a 10-letter hex passphrase. Next comes the
frequency assigned to the network.  iwconfig is the command used to attach the wireless
card with the proper wireless network, WEP key, and frequency.
We can scan and list the available wireless network by using the utility  iwlist . To scan, use
the following command:
# iwlist scan
wlan0 Scan completed :
Cell 01 - Address: 00:12:17:7B:1C:65
Channel:11
Frequency:2.462 GHz (Channel 11)
Quality=33/70 Signal level=-77 dBm
Encryption key:on
ESSID:"model-2"
The  Frequency parameter can be extracted from the scan result, from the line
Frequency:2.462 GHz (Channel 11) .
See also
f Comparisons and tests of Chapter 1, explains string comparisons.
Password-less auto-login with SSH
SSH is widely used with automation scripting. By using SSH, it is possible to remotely execute
commands at remote hosts and read their output. SSH is authenticated by using username
and password. Passwords are prompted during the execution of SSH commands. But in
automation scripts, SSH commands may be executed hundreds of times in a loop and hence
providing passwords each time is impractical. Hence we need to automate logins. SSH has
a built-in feature by which SSH can auto-login using SSH keys. This recipe describes how to
create SSH keys and facilitate auto-login.
www.it-ebooks.info
The Old-boy Network
254
How to do it...
The SSH uses public key-based and private key-based encryption techniques for automatic
authentication. An authentication key has two elements: a public key and a private key pair.
We can create an authentication key using the  ssh-keygen  command. For automating the
authentication, the public key must be placed at the server (by appending the public key to the
~/.ssh/authorized_keys file) and its private key file of the pair should be present at the
~/.ssh directory of the user at client machine, which is the computer you are logging in from.
Several configurations (for example, path and name of the  authorized_keys file) regarding
the SSH can be configured by altering the configuration file  /etc/ssh/sshd_config .
There are two steps towards the setup of automatic authentication with SSH. They are:
1. Creating the SSH key from the machine, which requires a login to a remote machine.
2. Transferring the public key generated to the remote host and appending it to
~/.ssh/authorized_keys file.
In order to create an SSH key, enter the  ssh-keygen command with the encryption algorithm
type specified as RSA as follows:
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/slynux/.ssh/id_rsa):
Created directory '/home/slynux/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/slynux/.ssh/id_rsa.
Your public key has been saved in /home/slynux/.ssh/id_rsa.pub.
The key fingerprint is:
f7:17:c6:4d:c9:ee:17:00:af:0f:b3:27:a6:9c:0a:05slynux@slynux-laptop
The key's randomart image is:
+--[ RSA 2048]----+
| . |
| o . .|
| E o o.|
| ...oo |
| .S .+ +o.|
| . . .=....|
| .+.o...|
| . . + o. .|
| ..+ |
+-----------------+
www.it-ebooks.info
Chapter 7
255
You need to enter a passphrase for generating the public-private key pair. It is also possible
to generate the key pair without entering a passphrase, but it is insecure. We can write
monitoring scripts that use automated login from the script to several machines. In such
cases, you should leave the passphrase empty while running the  ssh-keygen command to
prevent the script from asking for a passphrase while running.
Now  ~/.ssh/id_rsa.pub and  ~/.ssh/id_rsa has been generated.  id_dsa.pub is the
generated public key and  id_dsa is the private key. The public key has to be appended to the
~/.ssh/authorized_keys file on remote servers where we need to auto-login from the
current host.
In order to append a key file, use:
$ ssh USER@REMOTE_HOST "cat >> ~/.ssh/authorized_keys" < ~/.ssh/id_rsa.
pub
Password:
Provide the login password in the previous command.
The auto-login has been set up. From now on, SSH will not prompt for passwords during
execution. You can test this with the following command:
$ ssh USER@REMOTE_HOST uname
Linux
You will not be prompted for a password.
Running commands on remote host
with SSH
SSH is an interesting system administration tool that enables to control remote hosts by login
with a shell. SSH stands for Secure Shell. Commands can be executed on the shell received
by login to remote host as if we run commands on localhost. It runs the network data transfer
over an encrypted tunnel. This recipe will introduce different ways in which commands can be
executed on the remote host.
Getting ready
SSH doesn't come by default with all GNU/Linux distributions. Therefore, you may have to
install the  openssh-server and  openssh-client packages using a package manager.
SSH service runs by default on port number 22.
www.it-ebooks.info
The Old-boy Network
256
How to do it...
To connect to a remote host with the SSH server running, use:
$ ssh username@remote_host
In this command:
f username is the user that exist at the remote host.
f remote_host can be domain name or IP address.
For example:
$ ssh mec@192.168.0.1
The authenticity of host '192.168.0.1 (192.168.0.1)' can't be
established.
RSA key fingerprint is 2b:b4:90:79:49:0a:f1:b3:8a:db:9f:73:2d:75:d6:f9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.0.1' (RSA) to the list of known
hosts.
Password:
Last login: Fri Sep 3 05:15:21 2010 from 192.168.0.82
mec@proxy-1:~$
It will interactively ask for a user password and upon successful authentication it will return
the shell for the user.
By default, the SSH server runs at Port 22. But certain servers run the SSH service at different
ports. In that case use  -p port_no with the  ssh command to specify the port.
In order to connect to an SSH server running at port 422, use:
$ ssh user@locahost -p 422
You can execute commands in the shell that corresponds to the remote host. Shell is an
interactive tool in which a user types and runs commands. However, in shell scripting contexts,
we do not need an interactive shell. We need to automate several tasks. We require to execute
several commands at the remote shell and display or store its output at localhost. Issuing a
password every time is not practical for an automated script, hence autologin for SSH should
be configured.
The recipe, Password-less auto-login with SSH, explains the SSH commands.
Make sure that auto-login is configured before running automated scripts that use SSH.
www.it-ebooks.info
Chapter 7
257
To run a command on the remote host and display its output on the localhost shell, use the
following syntax:
$ ssh user@host 'COMMANDS'
For example:
$ ssh mec@192.168.0.1 'whoami'
Password:
mec
Multiple commands can be given by using semicolon delimiter in between the commands as:
$  ssh user@host 'command1 ; command2 ; command3'
Commands can be sent through  stdin and the output of the commands will be available to
stdout .
The syntax will be as follows:
$ ssh user@remote_host "COMMANDS" > stdout.txt 2> errors.txt
The  COMMANDS string should be quoted in order to prevent a semicolon character to act as
delimiter in the localhost shell. We can also pass any command sequence that involves piped
statements to the SSH command through  stdin as follows:
$ echo "COMMANDS" | sshuser@remote_host> stdout.txt 2> errors.txt
For example:
$ ssh mec@192.168.0.1 "echo user: $(whoami);echo OS: $(uname)"
Password:
user: slynux
OS: Linux
In this example, the commands executed on the remote host are:
echo user: $(whoami);
echo OS: $(uname)
It can be generalized as:
COMMANDS="command1; command2; command3"
$ ssh user@hostname "$COMMANDS"
We can also pass a more complex subshell in the command sequence by using the  ( )
subshell operator.
www.it-ebooks.info
The Old-boy Network
258
Let's write an SSH based shell script that collects the uptime of a list of remote hosts. Uptime
is the time for which the system is powered on. The  uptime command is used to display how
long the system has been powered on.
It is assumed that all systems in the  IP_LIST have a common user  test .
#!/bin/bash
#Filename: uptime.sh
#Description: Uptime monitor
IP_LIST="192.168.0.1 192.168.0.5 192.168.0.9"
USER="test"
for IP in $IP_LIST;
do
utime=$(ssh $USER@$IP uptime | awk '{ print $3 }' )
echo $IP uptime: $utime
done
The expected output is:
$ ./uptime.sh
192.168.0.1 uptime: 1:50,
192.168.0.5 uptime: 2:15,
192.168.0.9 uptime: 10:15,
There's more...
The  ssh command can be executed with several additional options. Let's go through them.
SSH with compression
The SSH protocol also supports data transfer with compression, which comes in handy when
bandwidth is an issue. Use the  -C option with the  ssh command to enable compression as
follows:
$ ssh -C user@hostname COMMANDS
Redirecting data into stdin of remote host shell commands
Sometimes we need to redirect some data into  stdin of remote shell commands. Let's see
how to do it. An example is as follows:
$ echo "text" | ssh user@remote_host 'cat >> list'
www.it-ebooks.info
Chapter 7
259
Or:
# Redirect data from file as:
$ ssh user@remote_host 'cat >> list' < file
cat >> list appends the data received through  stdin to the file list. Here this command
is executed at the remote host. But the data is passed to  stdin from localhost.
See also
f Password-less auto-login with SSH, explains how to configure auto-login to execute
commands without prompting for password.
Mounting a remote drive at a local mount
point
Having a local mount point to access remote host file-system would be really helpful while
carrying out both read and write data transfer operations. SSH is the most common transfer
protocol available in a network and hence we can make use of it with  sshfs .  sshfs enables
you to mount a remote filesystem to a local mount point. Let's see how to do it.
Getting ready
sshfs doesn't come by default with GNU/Linux distributions. Install  sshfs by using a
package manager.  sshfs is an extension to the fuse file system package that allows
supported OSes to mount a wide variety of data as if it were a local file system.
How to do it...
In order to mount a filesytem location at a remote host to a local mount point, use:
# sshfs user@remotehost:/home/path /mnt/mountpoint
Password:
Issue the user password when prompted.
Now data at  /home/path on the remote host can be accessed via a local mount point  /mnt/
mountpoint .
In order to unmount after completing the work, use:
# umount /mnt/mountpoint
www.it-ebooks.info
The Old-boy Network
260
See also
f Running commands on remote host with SSH, explains the ssh command.
Multi-casting window messages on
a network
The administrator of a network may often require to send messages to the nodes on the
network. Displaying pop-up windows on the user's desktop would be helpful to alert the user
with a piece of information. Using a GUI toolkit with shell scripting can achieve this task. This
recipe discusses how to send a popup window with custom messages to remote hosts.
Getting ready
For implementing a GUI pop window, zenity can be used. Zenity is a scriptable GUI toolkit for
creating windows consisting of textbox, input box, and so on. SSH can be used for connecting
to the remote shell on a remote host. Zenity doesn't come installed by default with GNU/Linux
distributions. Use a package manager to install zenity.
How to do it...
Zenity is one of the scriptable dialog creation toolkit. There are other toolkits, such as gdialog,
kdialog, xdialog, and so on. Zenity seems to be one flexible toolkit that is adherent to the
GNOME Desktop Environment.
In order to create an info box with zenity, use:
$ zenity --info --text "This is a message"
# It will display a window with "This is a message" as text.
Zenity can be used to create windows with input box, combo input, radio button, pushbutton,
and more. They are not in the scope of this recipe. Check the man page of zenity for more.
Now, we can use SSH to run these zenity statements on a remote machine. In order to run this
statement on the remote host through SSH, run:
$ ssh user@remotehost 'zenity --info --text "This is a message"'
But this will return an error like:
(zenity:3641): Gtk-WARNING **: cannot open display:
This is because zenity depends on Xserver. Xsever is a daemon which is responsible for
plotting graphical elements on the screen which consists of the GUI. A bare GNU/Linux system
consists of only a text terminal or shell prompts.
www.it-ebooks.info
Chapter 7
261
Xserver uses a special environment variable,  DISPLAY , to track the Xserver instance that is
running on the system.
We can manually set  DISPLAY=:0 to instruct Xserver about the Xserver instance.
The previous SSH command can be rewritten as:
$ ssh username@remotehost 'export DISPLAY=:0 ; zenity --info --text "This
is a message"'
This statement will display a pop up at  remotehost if the user with username has been
logged in any of the window managers.
In order to multicast the popup window to multiple remote hosts, write a shell script as follows:
#!/bin/bash
#Filename: multi_cast_window.sh
# Description: Multi-cast window popups
IP_LIST="192.168.0.5 192.168.0.3 192.168.0.23"
USER="username"
COMMAND='export DISPLAY=:0 ;zenity --info --text "This is a message" '
for host in $IP_LIST;
do
ssh $USER@$host "$COMMAND" &
done
How it works...
In the above script, we have a list of IP addresses to which the window should be popped up.
A loop is used to iterate through IP addresses and execute the SSH command.
In the SSH statement, at the end we have post fixed  & .  & will send an SSH statement to the
background. It is done to facilitate parallelization in the execution of several SSH statements.
If  & was not used, it will start the SSH session, execute the zenity dialog, and wait for the user
to close that pop up window. Unless the user at the remote host closes the window, the next
SSH statement in the loop will not be executed. In order to move away from this blocking of
the loop from further execution by waiting for the SSH session to terminate, the  & trick is used.
See also
f Running commands on remote host with SSH, explains the ssh command.
www.it-ebooks.info
The Old-boy Network
262
Network traffic and port analysis
Network ports are essential parameters of network-based applications. Applications open
ports on the host and communicate to a remote host through opened ports at the remote
host. Having awareness of opened and closed ports is essential for security context. Malwares
and root kits may be running on the system with custom ports and custom services that allow
attackers to capture unauthorized access to data and resources. By getting the list of opened
ports and services running on the ports, we can analyze and defend the system from being
controlled by root kits and the list helps to remove them efficiently. The list of opened ports
is not only helpful for malware detection, but also for collecting information about opened
ports on the system enables to debug network based applications. It helps to analyse whether
certain port connections and port listening functionalities are working fine. This recipe
discusses various utilities for port analysis.
Getting ready
Various commands are available for listening to ports and services running on each port (for
example,  lsof and  netstat ). These commands are, by default, available on all GNU/Linux
distributions.
How to do it...
In order to list all opened ports on the system along with the details on each service attached
to it, use:
$ lsof -i
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
firefox-b 2261 slynux 78u IPv4 63729 0t0 TCP localhost:47797-
>localhost:42486 (ESTABLISHED)
firefox-b 2261 slynux 80u IPv4 68270 0t0 TCP slynux-laptop.
local:41204->192.168.0.2:3128 (CLOSE_WAIT)
firefox-b 2261 slynux 82u IPv4 68195 0t0 TCP slynux-laptop.
local:41197->192.168.0.2:3128 (ESTABLISHED)
ssh 3570 slynux 3u IPv6 30025 0t0 TCP localhost:39263-
>localhost:ssh (ESTABLISHED)
ssh 3836 slynux 3u IPv4 43431 0t0 TCP slynux-laptop.
local:40414->boneym.mtveurope.org:422 (ESTABLISHED)
GoogleTal 4022 slynux 12u IPv4 55370 0t0 TCP localhost:42486
(LISTEN)
GoogleTal 4022 slynux 13u IPv4 55379 0t0 TCP localhost:42486-
>localhost:32955 (ESTABLISHED)
Each entry in the output of  lsof corresponds to each service that opens a port for
communication. The last column of the output consists of lines similar to:
www.it-ebooks.info
Chapter 7
263
slynux-laptop.local:34395->192.168.0.2:3128 (ESTABLISHED)
In this output  slynux-laptop.local:34395 corresponds to localhost part and
192.168.0.2:3128 corresponds to remote host.
34395 is the port opened from current machine, and  3128 is the port to which the service
connects at remote host.
In order to list out the opened ports from current machine, use:
$ lsof -i | grep ":[0-9]\+->" -o | grep "[0-9]\+" -o | sort | uniq
The  :[0-9]\+-> regex for  grep is used to extract the host port portion ( :34395-> ) from the
lsof output. The next  grep is used to extract the port number (which is numeric). Multiple
connections may occur through the same port and hence multiple entries of the same port may
occur. In order to display each port once, they are sorted and the unique ones are printed.
There's more...
Let's go through additional utilities that can be used for viewing the opened port and network
traffic related information.
Opened port and services using netstat
netstat is another command for network service analysis. Explaining all the features of
netstat is not in the scope of this recipe. We will now look at how to list services and port
numbers.
Use  netstat -tnp to list opened ports and services as follows:
$ netstat -tnp
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name
tcp 0 0 192.168.0.82:38163 192.168.0.2:3128
ESTABLISHED 2261/firefox-bin
tcp 0 0 192.168.0.82:38164 192.168.0.2:3128 TIME_
WAIT -
tcp 0 0 192.168.0.82:40414 193.107.206.24:422
ESTABLISHED 3836/ssh
tcp 0 0 127.0.0.1:42486 127.0.0.1:32955
ESTABLISHED 4022/GoogleTalkPlug
tcp 0 0 192.168.0.82:38152 192.168.0.2:3128
ESTABLISHED 2261/firefox-bin
tcp6 0 0 ::1:22 ::1:39263
ESTABLISHED -
tcp6 0 0 ::1:39263 ::1:22
ESTABLISHED 3570/ssh
www.it-ebooks.info
www.it-ebooks.info
8
Put on the Monitor's
Cap
In this chapter, we will cover:
f Disk usage hacks
f Calculating the execution time for a command
f Information about logged users, boot logs, failure boots
f Printing the 10 most frequently-used commands
f Listing the top 10 CPU consuming process in 1 hour
f Monitoring command outputs with watch
f Logging access to files and directories
f Logfile management with logrotate
f Logging with syslog
f Monitoring user logins to find intruders
f Remote disk usage health monitoring
f Finding out active user hours on a system
www.it-ebooks.info
Put on the Monitor’s Cap
266
Introduction
An operating system consists of a collection of system software, designed for different
purposes, serving different task sets. Each of these programs requires to be monitored by the
operating system or the system administrator in order to know whether it is working properly
or not. We will also use a technique called logging by which important information is written to
a file while the application is running. By reading this file, we can understand the timeline of
the operations that are taking place with a particular software or a daemon. If an application
or a service crashes, this information helps to debug the issue and enables us to fix any
issues. Logging and monitoring also helps to gather information from a pool of data. Logging
and monitoring are important tasks for ensuring security in the operating system and for
debugging purposes.
This chapter deals with different commands that can be used to monitor different activities. It
also goes through logging techniques and their usages.
Disk usage hacks
Disk space is a limited resource. We frequently perform disk usage calculation on hard
disks or any storage media to find out the free space available on the disk. When free space
becomes scarce, we will need to find out large-sized files that are to be deleted or moved in
order to create free space. Disk usage manipulations are commonly used in shell scripting
contexts. This recipe will illustrate various commands used for disk manipulations and
problems where disk usages can be calculated with a variety of options.
Getting ready
df and  du are the two significant commands that are used for calculating disk usage in Linux.
The command  df stands for disk free and  du stands for disk usage. Let's see how we can use
them to perform various tasks that involve disk usage calculation.
How to do it...
To find the disk space used by a file (or files), use:
$ du FILENAME1 FILENAME2 ..
For example:
$ du file.txt
4
www.it-ebooks.info
Chapter 8
267
The result is, by default, shown as size in bytes.
In order to obtain the disk usage for all files inside a directory along with the individual disk
usage for each file showed in each line, use:
$ du -a DIRECTORY
-a outputs results for all files in the specified directory or directories recursively.
Running du DIRECTORY will output a similar result, but it will show only the
size consumed by subdirectories. However, they do not show the disk usage
for each of the files. For printing the disk usage by files, -a is mandatory.
For example:
$ du -a test
4 test/output.txt
4 test/process_log.sh
4 test/pcpu.sh
16 test
An example of using  du DIRECTORY is as follows:
$ du test
16 test
There's more...
Let's go through additional usage practices for the  du command.