Untitled

DB2 or any other DBMS is a software system application relies on the integrated architecture, which includes an operating system, computer and storage hardware, applications and application servers, user and system interfaces, network interfaces, and security managers or servers. The following graphic displays an overview of the relationship between DB2 and these other architectural components that have studied in other courses.


Wikipedia provides the following superficial definition of a Database Management System - Database management systems (DBMS) are computer software applications that interact with the user, other applications, and the database itself to capture and analyze data. However, there are several important details that the previous graphic emphasizes.

1.1 A DBMS is a subsystem of an Operating System

 A DBMS is a software application or subsystem that is executed and managed by an operating system.  In a Hardware and Operating Systems students are required to learn the functions of an operating system. In this example, the z/OS mainframe operating was used in this example because IBM z/OS is the oldest (50+ years) and most powerful operating system which UNIX, Linux and Windows were designed to copy its features. All operating systems perform roughly the same operating system functions, perhaps with a different name.

The previous graphic illustrates that are many different applications executing under the operating system. Task Management is the function of the operating system to manage the execution of multiple applications, implement priorities to allocate CPU time, processors or processors cores.  If a system programmer allocated too much priority to a DBMS, the perhaps there would be a bottleneck for the users to access the DBMS. On the other hand, the allocation of too much processing resources to a User Interface would decrease the retrieval or storage time necessary to provide optimal response time. The bottom line is that the operating system attempts to manage and allocate computing resources to competing applications - the DBMS is only one of those applications.

A second major function of an operating system is to provide the device drivers to access external hardware (I/O) control, e.g., a storage subsystem. An important requirement for all DBMS is to store and retrieve data from an external physical storage device. DASD (what IBM calls their hardware disk drives) can be directly attached to a computer. Very few real-life DBMS use direct-attached storage (DAS) anymore except for students or small companies using a laptop or desktop computer. Most DBMS use a SAN (Storage Area Network) storage system which provides the ability to have multiple physical computers or virtual machines to share, are easy to scale and manage. The type of storage device, location of the storage device, the storage and communication/network protocol used is managed by the operating system. The DBMS does not care about these types of input/output or storage details.

1.2 A DBMS stores tables using the file system of the host operating system.

A file system is a collection of system programs that control the manner in which files are named, are created, and data is stored and later retrieved. Each operating system supports one or more file systems.  Windows support the file system name NTFSv5.  The primary z/OS file system is called z/FS. But, z/OS can display data sets using z/OS UNIX file system, which can be accessed by both z/OS and UNIX applications.

A DBMS will store data in a collection of related files of datasets. IBM describes the collection of meaningful characters or records as a data set; whereas, Windows, Linux, and UNX uses the terminology of a "file”.  Oracle and all DBMSs must be able to create (save), open, retrieve, modify, or deleted files or the content of files. But, Oracle does not perform these file system tasks, the operating system does.  The ability to logically manage stored content is performed by a file system.

A file system does not understand the details of the content or internal structure of a file or data set. A file or data set is a storage container for bytes or records. The applications, e.g., Word, a COBOL Program, etc. that stores data in the file container will understand the structure of the content, not the operating system or file system.

Windows, UNIX and Linux organize files by storing them in a file folder or directory. A file folder or directory is similar to the old-fashion card catalog of a library. A library catalog contains the name of a book and provides information concerning the location of the book in the library or located at other libraries. Today we simple use Google to find a location of the restaurant.

Every file system has structure that organizes files and improves the efficiency to find and locate files. Most modern hard disk or solid state drives (DASD) may individually store billions of files and the files system may organize files across thousands of distributed hard drives. I can't remember which drawer I put (stored) my socks.  Can you imagine the complexity of organizing files in today's file systems?

The z/OS file system is not hierarchical. This means that it does not have a folder (directory) or a subfolder (subdirectory) to organize and secure files. For example, some instructors organize their disk storage in Windows or Linux as a collection of folders and subfolders, e.g., Classes/INFS3212/Tests/Test 1.   The z/OS file system will not permit a hierarchical folder system. However, one can use zUnix to trick z/OS to use an alternative to make its file system more hierarchical.

1.3 Most operating system file systems are very primitive and often store few different organizations of a file.

The most popular operating system file structure the sequential file  -  a simple a sequence of characters which the operating system knows the disk location of the beginning and end of the file, the content is not organized in any meaningful manner, the content is not sorted, or easy to modified.

To provide more functionality, IBM provided an add-on subsystem called VSAM in the 1970s. DBMSs did not exist at that time.  VSAM data sets provided additional data structures that made it faster and easier to lookup an individual record inside a data set, e.g., Lookup Mary’s account balance, and keep the data dynamically sorted when records were added, modified or deleted.  All operating systems can sort data files and data sets. But as soon as you insert a new record you have to resort the data set. Most VSAM data sets required more processing overhead and disk storage than simple sequential files, but the business requirement advantages often offset the disadvantages.  For example, the response time to retrieve your account balance at an ATM machine may take 10-30 seconds for a sequential file, but less than a second for VSAM RRDS.

There are several problems associated with VSAM data sets or flat-files that DBMSs attempt to overcome. First, the structure of record and data types are defined by the application. In order for applications to have access to a flat file all applications must agree on the structure and data types else the logical data may not be accessible or even destroyed. If is very difficult to keep this application-oriented data description consistent across programming languages (incompatible formats) and updates. Therefore, the data is dependent of the program application.  It is difficult to centralize flat file data without duplication.

Sequential and the more powerful VSAM data sets are also described as flat-files to these storage concepts contrast them with Oracle tables. The z/OS VSAM data set is used in this discussion because provides an excellent illustration of the advantages and disadvantages of flat files compared to the advantages and disadvantage of an Oracle relational table.  All Oracle tables, which you use in this course, are stored on a mainframe as VSAM dataset. Oracle will be implemented differently on Windows, Linux or UNIX platforms.

1.3.1 A DBMS does not care about a host operating system or file system.

It uses the existing operating system and file system under which it is installed. Both Oracle and DB2 are examples of a relational DBMS. Both DBMS may run under Windows, Z/OS, Linux and UNIX. Both DBMS may use the z/OS, NTFSv5, or ext-4 file systems. Both DB2 and Oracle may use VSAM data set to store a table.

1.3.2 Disadvantages of a DBMS

Every DBMS has disadvantages of a) extra licensing costs, b) extra storage requirements, c) decreased performance, and different application program access methods.

You may ask, “There must be some advantages to a DBMS, else why would I take this course?”  One would be correct, but for now let’s review the following graphic to illustrate flat file processing and the role between applications and accessing data on a physical storage device.


1.4 Flat File Systems compared to Database Management Systems
1.4.1 Overview of Flat File Processing


Step 1. (Yellow Lines) - Applications may access VSAM either from CICS, web applications or batch processing using a program language statement like READ that is appropriate the programming language.

Step 2. (Green Line) -  The VSAM subsystem was access the operating system file system to locate and process a data set or file.

Step 3. (Black Line) - The operating system will use its I/O drivers, a storage device communication storage and logical storage protocols to access a specific physical storage device or storage area network.

Step 4. The storage device, controllers, routers, or switches will use their appropriate physical to access the physical device, disk volume, cylinder, track and sector which was logically requested by the operating system file system program.

1.4.2 Overview of DBMS Processing


Step 1. (Yellow Lines) - Applications may access Oracle from CICS, web applications or batch processing using SQL statements embedded (added) in the program language syntax, e.g., EXEC SQL   SELECT … END-EXEC.

Step 2. (Light Green Language) - Oracle accesses the flat files under management of Oracle.

Step 3. (Green Line) -  The VSAM subsystem was access the operating system file system to locate and process a data set or file.

Step 4. (Black Line) - The operating system will use its I/O drivers, a storage device communication storage and logical storage protocols to access a specific physical storage device or storage area network.

Step 5.The storage device, controllers, routers, or switches will use their appropriate physical to access the physical device, disk volume, cylinder, track and sector which was logically requested by the operating system file system program.

1.5 Limitations of Flat Files (Sequential and VSAM)

Early business information systems stored data into separate flat-files and were called file-processing systems. On an IBM mainframe sequential, partitioned and VSAM data sets were used to store information to be processed to transaction processing applications. VSAM is a high-performance access method used in the MVS, OS/390 and VSE/ESA operating systems. It was initially released by IBM in 1973 and is part of the Base product. VSAM provides a number of data set types or data organization schemes. They are: Key-sequenced data set (KSDS), Entry-sequenced data set (ESDS),    Relative record data set (RRDS), Variable-length relative record data set (VRRDS), and Linear data set (LDS)

Beginning with DFSMS V1.3 and V1.5, you can create and use VSAM data sets that were larger than the original 4-gigabyte limit.  The current limit of this writing is two terabytes.  VSAM record-level sharing (RLS) was introduced to provide the value of the Parallel Sysplex to the existing applications. RLS itself does not provide transactional recovery.  Batch applications cannot take full advantage of sharing a VSAM data set. To overcome this multiuser (concurrency) limitation, CICS provides a file access interface on top of VSAM. It is a CICS file control function that includes transactional recovery for VSAM files. This isolation and rollback capability enables VSAM data to be shared among CICS applications and regions.


Batch transaction processing using VSAM (flat-files) is very efficient, but it does have many limitations that DB2 and other database management systems (DBMS) attempt to overcome. DB2 will provide an extra layer of software (called middleware) that is designed to overcome the flowing limitations. However, it is important to understand that DB2 is built on top of a structure of VSAM datasets, which is managed by DB2 rather than z/OS.   Consider the following diagram:


1.5.1 Flat file data sources are separated and isolated.

It was difficult to determine which parts of the file were needed for which application program, e.g., "Did a simple inventory inquiry need to display the manufacturer's code?" It was difficult to relate data between files and between various business subsystems, e.g., "How did an inventory purchase transaction relate to the accounts payable invoice that was used to pay for the inventory purchase? Must each transaction relate?"  Finally, it was difficult coordinating data requirements between various business subsystems, e.g., "Did the year-to-date total purchases (an accounts payable function) for a vendor match those purchases stored in the inventory transaction files?"

1.5.2 Duplication of Data within Flat File Systems

It is very common to redundantly store information in multiple files when using a flat file processing system. The normalization design process states duplicated data should be minimized.  Let's assume that a prospective university students take the SAT test within a 250 mile radius of the university campus. These potential students may be stored in a "Prospective Student" VSAM KSDS data set, i.e., name, street, city, state, zip code, phone, etc. Then the student applies to the university and is added to the "Student Application" VSAM KSDS data set, i.e., name, street, city, state, zip code, phone, etc., Then the student becomes an active undergraduate student and is added to the "Undergraduate Student " VSAM KSDS data set, i.e., name, street, city, state, zip code, phone, etc.. Then consider a graduate, student employee, alumni, big giver status, sports supporter or any other role -> "XYZ"  SAM KSDS data set, i.e., name, street, city, state, zip code, phone, etc.  It is possible that your name, street, city, state, zip code, phone, etc., data be duplicated dozens of times.

Besides wasting storage space, there are more serious problems associated with duplicated data, e.g.,   ensuring that the data was logically consistent, i.e., data integrity. For example, assume a student changes their address or gets married and changes their last name.  With a flat file system these changes may require changes to many data sets to be logically consistent. Do not worry.  If you change your address, I will guarantee that you will get an invoice for your next semester classes, but it may take years to graduate and get a transcript if the data storage system is not properly normalized.

1.5.3 Files and VSAM data sets are not storage independent between program application

Application programs data are dependent on the file formats or data set organization. Every time a file format was changed, e.g., type of file, length of record, number or position of the key fields, the Environment Division and Data Division of all of the programs using those files also had to be changed.

File formats or data sets organizational (internal structure) depended on the application programming language that was being used, e.g., COBOL versus Java.

The conceptual structure of the stored data is dependent on the application and programming language used to access the data set.  Consider the following JCL code used create a VSAM data set.  Line 8 provides the only information concerning the structure of this VSAM data set: a) the record length is 170 characters and the primary key is eight characters long beginning at character 0.  Line 9 indicates that the name of data set is RMUI001.F1001.DATA.  Given this information does anyone know what type of business information is stored in the data set?  The answer is "the application must define the layout of the data.


O.K.  Your COBOL program requirements provide you with the following information that is used to store, retrieve and update a conceptual record, which has meaning to programmers and everyday people.


Hmmmm. This VSAM data contains some information about a customer (you wouldn't know this from the VSAM JCL).  Maybe the primary key is the account number of the customer, which is followed by the a second data field use to store the account limit defined by a data type of PIC S9(7)V99 COMP-3.  The sum of the PICTURE clause adds up to 170 characters.

Does this mean that a Java programmer could not access this VSAM data set?   No. But, the Java application must respect the rules of the COBOL application that created records. For example, the Java application must use a data type, e.g., double that is compatible with COBOL data type of PIC S9(7)V99 COMP-3. The data name must be changed since ACCT-LIMIT is not acceptable in Java.

Now consider using the following DB2 SQL CREATE TABLE statement.

CREATE TABLE CUSTOMER
(ACCOUNT             CHAR(8)  PRIMARY KEY,
ACCOUNT-LIMIT    DECIMAL(9,2),
BALANCE                DECIMAL(9,2),
LAST-NAME            CHAR(20),
FIRST-NAME           CHAR(15),
ADDR1                     CHAR(25),
ADDR2                     CHAR(20),
ADDR3                     CHAR(15),
RESERVED             CHAR(7),
COMMENT               CHAR(50));

DB2 will automatically create the appropriate VSAM KSDS and store the column names and SQL data types that may be accessed using a compatible API for any programming language.   Each DB2 table is VSAM data set.  Each DB2 index is a VSAM data set.

1.5.4 Application data was not stored by the way users view the real-world data.

The relationship between data stored in separate files was not obvious for users. For example, users frequently did not understand the difference between the concepts of adding a new inventory item to the inventory master file, e.g., writing and new record, and concept of purchasing an existing inventory item previously stored in the master file, e.g., updating (rewriting) the quantity on hand.

To minimize the duplication of data, the relationships between data sets use a design process called normalization. For example, the information presented on a customer invoice, may be actually stored in four or more VSAM KSDS data sets or records, yet the user sees only one business document. A user or application program may use a SQL process called a "Join" to combine the multiple data sets or records into a more logical, user-friendly view.

1.5.5 Simple flat files of VSAM data sets do not support Transactional Integrity

A transaction is a logical unit of work that contains one or more SQL of file operations statements. A transaction is an atomic unit; the effects of all the SQL statements in a transaction can be either all committed (applied to the database) or all rolled back (undone from the database).

To illustrate the concept of a transaction, consider a banking database. When a bank customer transfers money from a savings account to a checking account, the transaction might consist of three separate operations: decrement the savings account, increment the checking account, and record the transaction in the transaction journal. Assume that savings account balance was decreased, but the checking account balance was not increased. The transaction was not complete or logical.  When a DBMS system supports transactions integrity either all of the transactions were complete or none.  Requirements for DBMS Transaction integrity are specified by logical business requirements.

A database must allow for two situations. If all three SQL statements can be performed to maintain the accounts in proper balance, the effects of the transaction can be applied to the database. However, if something (such as insufficient funds, invalid account number, or a hardware failure) prevents one or two of the statements in the transaction from completing, the entire transaction must be rolled back so that the balance of all accounts is correct.


A transaction begins with the first executable SQL statement. A transaction ends when it is committed or rolled back, either explicitly (with a COMMIT or ROLLBACK statement) or implicitly (when a DDL statement is issued). Transactional integrity is implemented through COMMIT/ROLLBACK commands (relational databases) and START/END/ABORT TRANSACTION (network and other databases).

A database must allow for two situations. If all three SQL statements can be performed to maintain the accounts in proper balance, the effects of the transaction can be applied to the database. However, if something (such as insufficient funds, invalid account number, or a hardware failure) prevents one or two of the statements in the transaction from completing, the entire transaction must be rolled back so that the balance of all accounts is correct.

1.5.6 Simple flat files or VSAM data sets do not support Referential Integrity

A database system can provide referential integrity by ensuring that prerequisites for a transaction are stored before a transaction is completed.  For example, a customer information/credit approval record must be stored before an invoice is processed.  A student row must be stored before a student can register for a class.

 Should a payroll transaction be completed for an employee that does not exist? Of course not. Should a student be able to register for a course that does not exist? Of course not.  VSAM cannot stop these illogical transactions from not occurring and it relies on the application programmer or user to prevent these events. A DBMS can enforce these business rules.

DBMS concepts which provide referential integrity protection are implemented through the use of 1) FOREIGN and PRIMARY KEYS (DB2 relational databases), 2) Mandatory Set Membership (network databases), and 3) a Database Definition (IMS hierarchical databases).

Without referential integrity an application program must code the preregister constraints in each and every application.  In case of VSAM, maintaining referential integrity becomes the application developers' responsibility. Referential integrity of data across tables is centralized and taken care of by DB2 itself. DB2 maintains referential integrity based on the rules specified by the application developer. IBM's DataLinks technology manages the integrity of externally stored data even when maintained by external applications.

A database system can provide referential integrity by ensuring that prerequisites for a transaction are stored before a transaction is completed.  For example, a customer information/credit approval record must be stored before an invoice is processed. Concepts related to referential integrity are implemented through the use of FOREIGN and PRIMARY KEYS (relational databases), Mandatory Set Membership (network databases) and a Database Definition (hierarchical databases like IMS).

1.5.7 Simple flat files or VSAM data sets do not support multiple users – Concurrency

A database must serve up to tens of thousands users simultaneously while presenting a façade to each user that he or she is the only one accessing the database. SQL Server, DB2, and Sybase maintain this façade by locking data when one user updates it, making all other users wait until the original user is finished before they can "see" the data. Oracle maintains the single-user façade by versioning the data—that is, allowing the user to see a consistent view of it as of a specific point in time. Whether your database uses locking or versioning, serving multiple users as though each one owns the database can raise concurrency issues—most often when more than one user tries to read and update the same row at approximately the same time.

One must decide what your app will do when a user checks a row out of the database to view and possibly update it. The row might change between the time the user checks it out and the time he or she completes the update. If you simply overwrite the row in the database when the user clicks on the Update button, updates that other users made in the meantime will be lost. Take the simple example of multiuser access to a variable (see Figure 1).


User A retrieves the value of the variable X and sees that it's 0. In the meantime, User B changes the value to 3. When User A adds 1 to the variable's supposed value of 0 and writes it back to the variable, the variable's value is 1. However, if User B's changes were taken into consideration, the value would be 4. User B has lost his update. This is known as "last writer wins." Imagine how simple it would be for your bank account to get out of balance if it worked like this.

You can use concurrency checking to protect against lost updates and other update anomalies. The two main types of concurrency checking are optimistic concurrency and pessimistic concurrency. Pessimistic concurrency ties locks in the database to user input: You lock the rows when you fetch them. You don't implement optimistic concurrency by locking rows. Instead, you fetch rows initially without taking locks, then check during the update to see if the rows are still unchanged. You're optimistic that no one else will update a row while your user views it. You can implement various degrees of optimism in the way you construct the UPDATE statement.

Important to understand!  While the previous listed the short comings of VSAM data sets, a DB2 table is actually a ESDS data set.  DB2 does not store information in data sets, VSAM does. When one creates a DB2 table, DB2 allocates a VSAM data set. When a user INSERTs a row in a DB2 table, VSAM stores the data as a record.  DB2 databases adds features and capabilities to VSAM data set, e.g., program independence, transactional and relational integrity, multiuser concurrency, and so forth.

1.5.8 Flat File Systems will require Application Data Validations

1.6 A DBMS is not a Programming Language or User Interface.

A DBMS emphasizes, flexibility, the efficient storage and integrity of stored data. Computer or Business DATA is a NOUN.  Program languages manipulate DATA. Program languages are VERBS. A program language will code a list of VERBS to process a sequence of actions. While a program language store and manipulate stored data using VERBS, the requirements of an application programmer are different than a database administrator.

Assume that a system or business analyst requirement is to design a new user interface, e.g., a mobile application, to add employee timecards to a database table for other applications to process later. The application programmer is concerned with the detailed processing requirements to implement the user interface, validate the entered data, provide feedback to the user, and finally stored the data in a Timecard database table.  The DBA, and this course, is only concerned with structure of the data to be stored by the application to support this and other applications.  In fact, the DBA does not care what your user interface is? The user interface may be a web page, a batch program, or a web service transaction. However, it is extremely important for the DBA to reliably store data to support the requirements of ALL applications that use the data. In my design system, my employee time card table supports about 39 difference applications.
2.0 Alternative Database Management Systems

As previous stated, the IBM mainframe server and operating system z/OS are the most powerful computer platform by thousands of magnitudes. There is no substitute for large scale transaction processing.  The “B” in IBM stands for business. However, most of us are more familiar with medium and desktop consumer platforms, like Intel and Motorola process and Windows, UNIX and Linux/Android/Mac OS platforms.

While Windows, UNIX and Linux-based servers are also used as business server, one IBM mainframe model z13 has the equivalent processing power of 2,800 Windows/Intel based servers.  But, if an organization does not need that level of processing power, why should an organization pay for the extra cost and complexity of a mainframe server? The 2014 Obama Health Care Web Site struggled with processing 300,000+ logons per day. On J.P. Morgan Chase Visa Card IBM server, over 300,000 transactions may be processed each second, not by each day.

Just as computer platforms are organized into of a hierarchy processing power so are RDBMS systems organized into a hierarchy of power and flexibility.

Microsoft Access is a desktop DBMS which may be shared by 10 or less users. Access’s performance, features and flexibility is poor and extremely limited, but it was designed to be easy-to-use and to be integrated with the other Microsoft Office products.  It was designed to be managed by accountants and administrative assistants without the support of a computer professional.

Microsoft SQL Server DBMS is designed to be an Enterprise DBMS.  DBMS administration is easier than the other major enterprise DBMSs, e.g., Oracle, DB2, or My SQL. But, SQL Server can only be installed on a Windows Server platform. Whenever someone cites the advantage “easy administration”, this always means poorer performance, flexibility, and reliability.  For example, SQL Server is at the bottom of the enterprise DBMS features list in areas such as performance (indexes especially), customization, optimization and tuning, concurrency, distributed services, security, program language support, SQL procedures (PL/SQL, T-SQL), clustering technologies and reliability.  SQL Server’s security is EC-12 complainant; its security does match what is available on an IBM mainframe platform. While less expensive to maintains than Oracle or DB2, would you want your credit card transactions to be processed and managed by this DBMS?  But, SQL Server does server a very important role.

MySQL DBMS is an open source sponsored and supported by Oracle. It supports a wider variety of operating system platforms and programming languages. Being open-source MySQL is available at no-cost and is technical support is available by the open source community. MySQL is not as easy to manage as SQL Serve. And like For example, SQL Server is at the bottom of the enterprise DBMS features list in areas such as performance (indexes especially), customization, optimization and tuning, concurrency, distributed services, security,  SQL procedures, clustering technologies and reliability. Other open-source DBMS alternatives include: PostgreSQL, Firebird, and SQlLite. PostGreSQL does provide many more DBMS features, but is not as easy-to-administer and some performance issues than MySQL.

While MySQL and PostgreSQL are recommended as enterprise DBMSs that a beginner may use to learn how to use and manage a DBMS, their no-cost advantage make these open-source solutions attractive alternatives for shared DBMS within the organization on an internal private network, or an external database where security is not a critical requirement. Whatever the role of open-source DBMSs, it is easily preferred to a shared Access data base. A better use of a shared Access database is to store shared reciepes and to support football pools after the number of connected users exceeds 5. By the way, you can also download free copies of Personal Oracle and Personal DB2 to learn how to use these DBMSs. The versions or almost full featured, but they are limited to a single-user.

Last word. SQL means a Structured Query Language. MySQL is a full DBMS that uses SQL. SQL is NOT a programming language, e.g., CBOL, JAVA, .NET, etc.. SQL is a data definition and data manipulation language. SQL has no loops, conditional statements and normally not-compiled. PL/SQL (Oracle), T-SQL (SQL Server), and Stored Procedures (DB2) are programming languages which supports SQL DBMS applications. This is why knowledge of SQL is only the start of working with DBMSs

IBMs DB2 and the Oracle DBMS are very powerful and comparable DBMS, especially when deployed on an IBM mainframe server. The previous graphics illustrated the concept that both DB2 and Oracle may co-exist of a mainframe server. Why? As you can determine from the previous discussion, there is no such thing as the “best” DBMS. The “best appropriate” DBMS depends on application of a DBMS to meet a processing requirement, operating costs, administrative complexity, security, distributed services, cluster technologies, and reliability.

In the 1960s, IBM Edgar F.Codd introduced the concept of the relational model.  IBM DB2 was the first relational DBMS (1974). Oracle DBMS (1977) other previously listed DBMS companies followed IBM’s lead to apply the relational model to other platforms. Originally, DB2 could only be executed on a mainframe. Today, DB2 may also be executed on Windows, Linux, Unix and the Mac OS.  Like other IBM strategic initiatives, DB2 was designed to optimize business transaction processing in a batch environment, but then again what is the cost of this performance?

Other DBMS requirements than cost include system-level data recovery, application-level data recovery, application data availability, performance and scalability, ease-of-development, integration of e-business applications, automation,  encryption support and interoperability  (IT Management's Top 10 List of DBMS Requirements - http://www.databasejournal.com/sqletc/article.php/3896211/IT-Managements-Top-10-List-of-DBMS-Requirements.htm). In the more recent years Oracle relational databases have been preferred to house Data Warehouse and Data Marts to support Business Analytic requirements and support ERP systems, e.g., SAP.

Modern Transactional Databases (OLTP - Online Transaction Processing) and Data Warehouse (OLAP - Online Analytical Processing) are frequently implemented as a relational data base. The difference between these two requirements is how the database tables are designed and how the database tables are processed.  Transactional databases are designed to minimize the amount of stored data (normalized design) and performance of simple insert or retrieval operations typically found in business transactions. But transactional database design performance suffers when complex retrievals and computational requirements are used support management decision making typically supported by data warehouse designs. As a result, many organizations support two different systems of relational databases, e.g., one to support transactional processing and a second to support management decision making.

 In late 2014, IBM introduced a new mainframe model which included a new processor type that will dramatically change the way Data Warehouses architectures are used. This new design will substantially improve the performance of processing transactional data in place, and reduce the need to support of a second data base, e.g., data warehouse, to support management decision making.  In 2015, a new version of DB2 will be introduced to take advantage of this new type of analytic processor. Does mean that Oracle’s role on an IBM mainframe will change? Hell no. How difficult and at what cost would you expect be to re-train Oracle DBAs and to migrate an Oracle Data Warehouse to a DB2 Data Warehouse?


3.0 Relational Data Modeling and Conceptual Design

The relational data model describes the conceptual representation of the data objects of relational databases and gives guidelines for their implementation. The following will discuss the main relational data object, the table, and some of the guidelines applicable to the implementation of tables.

Conceptually, all data in relational databases is stored in tables. Also, when data is presented to a user externally, it has the appearance of a table. Tables consist of rows and columns as we will discuss in this unit.

3.1 Tables

Tables are the main data object described by the relational data model. Conceptually, all data of relational databases is stored into tables. Also, all data returned to a user is presented in form of a table.


Structurally, as with tables in books or newspapers, a table is subdivided in rows and columns. Horizontally, a table is subdivided in rows. The data stored into a row is logically related and belongs to a single object, such as a person or an aircraft model. Conversely, the data for a single object is stored into a single row.

You can compare rows to records in flat files for regular access methods or to segments in hierarchical databases. From the access method's point of view, records are unstructured.  In contrast, from the database management system's perspective, rows are structured. Their structure is determined by the columns of the table.

Vertically, a table is subdivided into columns. All data stored into a column has the same semantical meaning and is of the same type. Columns have names. You can define the name of a column and should choose it in such a way that it expresses the semantical meaning of the column.

The columns of a table subdivide the rows of the table into fields. The fields are the actual receptacles for the data stored into a table. All rows of a table are subdivided in the same manner, i.e., have the same columns in the same order.

A field may or may not contain data. The data in a field is also referred to as the value of the field or the value of the column for the appropriate row. From the relational database management system's point of view, the data in a field is atomic and unstructured. This means that, from the relational database management system's point of view, a field contains a single value. This does not preclude that the relational database management system may offer (column) functions allowing you to further manipulate the data of a column.

The displayed table represents the aircraft models owned by an airline, not the individual airplanes of the airline. For a basic aircraft type, such as a Boeing 737, there may be multiples models constituting the basic configurations. The column MODEL contains the appropriate model number.

Column MODEL alone does not uniquely identify the various aircraft models. The columns TYPE and MODEL together do.

3.2 Uniqueness of Rows and Columns


In contrast to records that are always retrieved in their entirety, you need not retrieve all columns of the rows. You can select particular columns by providing their names. You can also only change selected columns of the rows of a table. For this reason, the relational data model requires that all column names of a table be unique. Thus, in the example on the visual, you cannot have four columns with the name ENGINE. If you need all four columns, you must name them differently. In the example, the four columns have been renamed to ENGINE_1, ENGINE_2, ENGINE_3, and ENGINE_4.

In many cases, if you have naming conflicts for columns of a table, the semantics of the conflicting columns has not been defined sufficiently. By better defining the meaning of the columns, you may find different, more meaningful, names for the columns as is the case for the illustrated example. (The engines of an aircraft are generally referred to as Engine 1, Engine 2, and so on.)

From a design point of view, the illustrated solution may not even be the desirable solution.
What happens, for example, if new aircraft models are introduced whose aircraft have more than four engines? Iit may not be the best solution to store the engines for the various aircraft in the illustrated table AIRCRAFT. To enable additional engines as new aircraft models require them, it might be better to have an additional table ENGINE_ON_AIRCRAFT that contains, for each aircraft, one row for each of its engines (Normalization). This row would specify the aircraft (its serial number) on which the engine is mounted, the engine's serial number, its position on the aircraft, and other characteristics of the engine.

If you also wanted to register engines that are not mounted on aircraft, it would even be better to have two tables, ENGINE and ENGINE_ON_AIRCRAFT. The first table would specify all engines with their engine-specific and aircraft-independent characteristics. The second table would specify on which aircraft engines are mounted, their position, and any characteristics that are both engine- and aircraft-dependent, such as, when the engine was mounted on the aircraft. The solution illustrated on the visual makes it also difficult to list the engines ordered by engine serial number. UNIONs would be required to achieve this.

In the same way, as you can retrieve or update selective columns of a table, you can retrieve, update, or delete specific rows of a table. To ensure that you can do this, the relational data model recommends that all rows be unique, i.e., that no two rows contain the exact same data. Many relational database management systems do not enforce this rule, but there are some which do. Therefore, if your database design is to be system-independent, you should make sure that the rows of your tables are unique. We will see later in the course how you can achieve this.

There are also other design considerations that make it highly recommendable to ensure that all rows of a table are unique. A design should not just be short-lived, it should be something lasting. At this moment, duplicate rows in a table may be fine because you might not intend to retrieve, update, or delete rows individually. However, your perception may change as new applications are introduced.

Ask yourself why you may want to have multiple identical rows? If you only need them to determine how often the event creating the rows occurred, you might be better off to add a column to the table counting the occurrences and remove the duplicates. This may reduce the space required for your table and improve performance. The design methodology taught in this course will insist that rows in the resulting tables are unique.DB2 allows duplicate rows in tables, but can automatically ensure that no two rows are alike in a table.

3.3 Order of Rows and Columns


According to the relational data model, the sequence in which the rows and columns of a table are physically stored in a relational database is completely up to the relational database management system. Conversely, the physical sequence of the rows and columns does not imply the sequence in which the rows or columns are returned if an ordering has not been requested by the end user or application. As a matter of fact, the same (unordered) retrieval request issued twice may return the rows and columns in a different sequence the second time it is issued.

This means that an application cannot rely on the physical sequence of the rows or columns in the database. If the order of the rows or columns is important to the application during retrieval, it must tell the relational database management system how the returned rows should be ordered. The ordering of the rows can be based on the values of one or more columns of the table; the order can be by ascending or descending column values.

The order that can be requested is always a logical order and never a physical order. The application can define the order in which the columns are to be returned by specifying the column names in the desired sequence in the retrieval request.

There are many reasons why two subsequent unordered retrieval requests may result in different row sequences. The actual reasons are dependent on the specific relational database management system being used. Collectively, some of the reasons are:

•	A reorganization has taken place.
•	A new index has been added or an index has been dropped.
•	The table statistics suggest a different access path for the optimal retrieval of the rows and the optimizer has chosen this access path. This can have been caused by an intermediate update of the table statistics.

he internal buffers of the relational database management system already contain some rows of the table and the rows are not retrieved again to optimize performance. If a specific order of the columns has not been requested during retrieval, some relational database management systems (e.g., DB2) return the columns in the order in which they have been defined. Nevertheless, retrieval requests in application programs should always specify an explicit column order. If they do not, the corresponding application programs must be changed whenever the database administrator redefines the table with a different column order to improve performance.

Relying on an implicit column order also makes you system-dependent.

3.4 Linkage of Tables


A table is seldom on its own meaning that a relational database normally consists of multiple tables which are logically interconnected.

The visual shows two tables, AIRCRAFT_MODEL and MANUFACTURER. There is clearly an interconnection between the two tables. Each row of table AIRCRAFT_MODEL contains an identifier for the manufacturer of the corresponding aircraft model, but does not give any details for the manufacturer. The details for the manufacturer are contained in table MANUFACTURER. Logically, each row of table AIRCRAFT_MODEL with a specific manufacturer-id is interconnected with the row of table MANUFACTURER having the same manufacturer-id.

The relational data model prescribes that logical associations are not physically implemented in the relational database and that they are dynamically established, by means of Join operations, on a request-by-request basis. In particular, there are no physical pointers, such as addresses, in the columns referring to rows of other tables. The request-based joining of tables is accomplished by means of the values of the columns named in the join operation.

The joining of the rows for aircraft models A340, Model 100, and A320, Model 200, in table AIRCRAFT_MODEL with the proper manufacturer is achieved by having the same value (AIRBUS) in columns MANUFACTURER and MID, respectively. Of course, the columns must be specified in the request performing the join operation.

Similarly, all rows for aircraft models having the value BOEING in the MANUFACTURER column are joined with the appropriate row of table MANUFACTURER. The important point is that, during relational database design, you need not worry about physical pointers. However, you will have to worry about logical relationships which are realized through column values rather than pointers. Column values are not affected by reorganizations, physical pointers may be affected. The relational data model disallows externally visible pointers, but does not prohibit internal pointers (e.g., in index entries) that are not externally visible.

A view is simply any SELECT query that has been given a name and saved in the database. For this reason, a view is sometimes called a named query or a stored query. An example of a view in a Access data base would be a query.  A View can hide complexity. If you have a query that requires joining several tables, or has complex logic or calculations, you can code all that logic into a view, then select from the view just like you would a table. A View can be used as a security mechanism. A view can select certain columns and/or rows from a table, and permissions set on the view instead of the underlying tables. This allows surfacing only the data that a user needs to see. A view can simplify supporting legacy code

3.5 Concepts of Relational Database

Concept	Description
Table (Relation)	A table consists of rows and columns. Each row in the table represents a collection of related values. Tables are used to hold information about the objects to be represented in the database.

In a specific table, there is data of one kind of objects (entities). In the example Car Rental database there are four tables: Customer, Rent, CarType and Car. In the Customer table there is the data of the customers, in the Rent table there is the data of the car rentals. A table is a database concept, a relation is a relational model concept.
Row (Tuple, Record)	Each row in the table represents a collection of related values of a one object (entity). There is a data of a one customer in one row in Customer table. A row is a database concept, a tuple is a relational model concept. A record is a little bit outdated term for a tuple or row.
Column (Attribute)	Each column in a table holds a certain kind of data. A Column has a name that describes the data of the column. In the Customer table there are columns e.g. firstname and surname. A column is a database concept, an attribute is a relational model concept.
Field	A field stores the actual value of an attribute. There are broken lines to show the values of a certain attribute that are stored in the fields.
Primary key	Primary key is the column (or set of columns) which values uniquely indenfy the row. All primary key fields have a different value in a specific table. A table should have a primary key.

The primary key of the table Car is the register number (Reg_No). Two cars can't have the same register number.
Foreign key	Foreign key is a column whose values refer to the primary key of another table.

For example in the Car table the Model_id values refer to the Model_id of the CarType table. The car whose register number is 'ABC-111' is Ford Focus. The Cust_id in the Rent table refers to the Customer table Cust_id. The customer number 5 who has rented a car having register number CCE-326 is Patricia Smith. Primary table is the table whose primary key is referenced from another table's foreign key. The table having this foreign key is a related table.
Concepts of Relational Database - http://www2.amk.fi/digma.fi/www.amk.fi/opintojaksot/0303011/1146161367915/1146161783414/1146163033699/1146163141034.html

Concept	Description
DB2  and Oracle Schemas	A schema is a collection of named objects that provides a logical classification of objects in the database. The first part of a schema name is the qualifier. For example: RMUI001.EMPLOYEE

A schema provides a logical classification of objects in the database. The objects that a schema can contain include tables, indexes, table spaces, distinct types, functions, stored procedures, and triggers.
Database Schema 	A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data. A database schema defines its entities and the relationship among them. The DB2 or Oracle definition  includes the components and relationships of database schema, but adds the complexity of distinguishing among multiple databases and security constraints.
Database Index 	A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
Integrity constraints	Integrity means something like 'be right' and consistent. The data in a database must be right and in good condition. There are four types of integrity constraints: domain integrity, entity integrity, referential integrity and foreign key integrity constraints.
SQL 	SQL (Structured Query Language) is a standardized relational DBMS data retrieval and management language, used for managing relational databases and performing various operations on the data in them. SQL is not technically a programming language, but can integrated in SQL-like languages, e.g., PL/SQL, DB2 Procedural languages, which contain procedural constructs in a high-level programming, e.g., decision, integration, calls, variable types, compiled, packaged, etc.

SQL statements are categorized as SQL DDL, SQL DDL, SQL TCL, and SQL DCL
SQL DDL 	Data Definition Language (DDL) statements are used to define the database structure or schema, e.g., CREATE TABLE, ALTER, etc.
SQL DML 	Data Manipulation Language (DML) statements are used for managing data within schema objects e.g., SELECT, UPDATE, CALL, etc.
SQL TCL 	Transaction Control (TCL) statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions. Examples include COMMIT, ROLLBACK, SET TRANSACTIn
SQL DCL 	Data Control Language (DCL) statements are used to assign database and table privileges, e.g., GRANT and REVOKE, and to perform other DBMS administrative functions
Join	A DML statement that combines data stored in multiple tables to be processed in a business friendly process.
View 	A DDL statement that often stores the structure of a Join, or provide a different view of a table or join which may be more business friendly.


Concept	Description
Stored Procedures	Each relational DBMS will provide which contain procedural language constructs and high-level programming functions, while integrating SQL statements. Two popular DBMS procedural languages are Oracle PL/SQL and DB2 stored procedures. These procedures are stored in the relational database, not the file system. These procedures are compiled, can be re-used by an modern programming language, and is secured by the DBMS.
Triggers 	A trigger is a set of actions stored as an event procedure, which are automatically performed by responding to an INSERT, UPDATE or DELETE operation on a specified table in the database. Triggers can: 1) generate derived complex column values automatically, 2) enforce referential integrity without foreign keys, 3) log events, 4) audit transactions, 5) synchronize replication of tables, 6) impose security authorizations, or 7) prevent invalid transactions
Functions 	Is similar to a procedure, except is designed to provide a single result and may be used in SQL DML statements.
Cursors 	Is a SQL data structure that can store data for more than one row in a procedure. Microsoft calls a cursor a record set.


3.6 Relational Database Schema and SQL DDL

A relation is a two-dimensional table. Each relation is comparable to the description of an object or entity. Each row or tuple generally contains information concerning one person, thing or event, i.e., an instance of an object. A N-tuple relation is a relation that simply contains any number of records. The columns or fields are called attributes. Using the previous bank example, the physical design would be similar to the following:

              Relation or Base Table		Inverted Index

                    BRANCH                		 BRANCH_ID_INDEX
                    CUSTOMER                	 CUSTOMER_ID_INDEX
         TELLER                   	            TELLER_ID_INDEX
                    ACCOUNTS                  	 ACCOUNTS_ID_INDEX
                    LOANS                   		 LOANS_ID_INDEX

Each relation or base table uses a simple independent stack data structure. Order and access is provided through a separate data structure called an INDEX. Interfile relationships are maintained through logical access to records via key fields or through the use of a VIEW. Unlike the other database physical schemas, relational databases can be easily changed, thereby making the data base more responsive to the information needs of management while keeping database administrative costs down.

The performance and storage space requirements of relational databases are as compared to the other storage databases is not as favorable. Relational databases cannot perform batch processing tasks as efficient as a hierarchical database, but does provide better perform for online applications. Relational databases cannot perform online processing requirements as efficient as a network database, but provides a greater degree of storage and conceptual independence and a wider variety of front end access tools. A partial Relational Schema code would be:
A relational model design may have many parents associated with many children, by a many-to-many relationship must be physically limited to one-to-many relationships

A Sample Relational Schema

	CREATE TABLE BRANCH (BRANCH_ID CHAR(5) NOT NULL,
                          BRANCH_NAME CHAR(35) NOT NULL)

	CREATE TABLE TELLER (TELLER_ID CHAR(5) NOT NULL,
                          TELLER_NAME CHAR(30) NOT NULL)

	CREATE TABLE CUSTOMER (CUSTOMER_ID CHAR(5) NOT NULL,
                          CUSTOMER_NAME CHAR(30) NOT NULL)

	CREATE TABLE LOANS   (LOAN_ID CHAR(5) NOT NULL,
                           LOAN_AMOUNT MONEY NOT NULL,
                           LOAN_BALANCE MONEY NOT NULL,
                           LOAN_INTEREST DECIMAL(8,8) NOT NULL)

	CREATE TABLE ACCOUNTS (ACCOUNT_ID CHAR(5) NOT NULL,
                           ACCOUNT_AMOUNT MONEY NOT NULL)
     CREATE INDEX BRANCH_ID_INDEX ON BRANCH(BRANCH_ID)
     CREATE INDEX TELLER_ID_INDEX ON TELLER(TELLER_ID)
     CREATE INDEX CUSTOMER_ID_INDEX ON CUSTOMER(CUSTOMER_ID)
     CREATE INDEX LOANS_ID_INDEX ON LOANS(LOAN_ID)
     CREATE INDEX ACCOUNTS_ID_INDEX ON ACCOUNTS(ACCOUNT_ID)

4.0 DB2
4.1 Introduction to DB2

DB2 is a family of relational database management system (RDBMS) products from IBM that serve a number of different operating system platforms. According to IBM, DB2 leads in terms of database market share and performance. Versions of DB2 are available for z/OS, UNIX/LINUX and Windows.

The IBM DB2 Express-C is the free edition of DB2—free to develop, free to deploy, free to distribute. It is easy to use and provides a solid base to build and deploy applications developed using C/C++, Java™, .NET, PHP, Ruby on Rails, Python, and other programming languages.

4.2 Advantages of DB2 DBMS

4.2.1 Scalability

Platform support and Scalability DB2 provides high level of scalability to support multiple platforms, ranging from workstation to mainframe, and is available on a wide range of platforms including Windows, HP-UX, Sun Solaris, UNIX, AIX, Linux, AS/400 and OS/390.

VSAM is tightly coupled with the mainframe and hence has a restricted choice of platform. In addition, DB2 is an RDBMS and so it is easier to migrate to a different RDBMS -- e.g., Oracle.

A DB2 V6 partitioned table can have 254 partitions, with each partition a VSAM dataset up to 2 terabytes. So DB2 can be 254 times larger than the largest possible VSAM data set. The actual maximum size varies by platform and versions.  It is important to remember that is the maximum size of a single table.  A DB2 database is a collection of DB2 objects under one set of administration and security DB2 database system tables. There many thousands of DB2 tables and thousands of DB2 data bases.

A DB2 storage group is the set of z/OS volumes that DB2 may use to store DB2 tables or VSAM data sets.

4.2.2 SQL Support.

DB2, being an RDBMS and following the ANSI SQL standard, has the distinct advantage of simplicity, richer functionality and ease of maintenance. It is easy to view or modify data with DB2 databases using facilities like QMF, SPUFI and DB2 forms. Using simple SQL, both standard, as well as, ad-hoc reports can be generated through QMF. DB2 also delivers significant query and client interface functions such as Net.Data, Visual Explain, DB2 installer, DB2 warehouse manager and control center.

4.2.3 Storage and Conceptual Independence.

 With DB2, as the physical structure of the data bases changes (such as adding or removing indexes), or business process changes that requires columns need to be added, removed, or renamed, DB2 applications have the ability to continue to accessible to existing applications. This is not possible with VSAM data sets or flat file systems.

A database system can separate the way the programs and users access the data from the physical way the data is stored. This concept is associated also with the concept of storage independence. Programmers and users need not be concern with the details of physical way the data is stored and should not have to change their procedures if the physical way the data is stored is changed.

The manner in which a database is stored is described in the Storage Schema. The commands use to define the Storage Schema are called Data Definition Language (DDLs) commands, e.g., CREATE TABLE, CREATE INDEX, etc.

Database storage formats are independent of a programming language. For example, with MySql or Oracle, Visual BASIC (VB.NET), C program, a scripting language (Active Server Pages, PHP and PERL) can access the same database. Furthermore, a variety of other front-end tools can be used to access the stored data, e.g., MicroSoft's ACCESS, Crystal Reports, etc.

Storage independence requires that the database be self-describing. First, the methods used to store the user's data is separated from the data structures used to access and order the data, e.g., indexes. Second, information concerning the structure of the data and indexes is stored in a data dictionary, i.e., metadata.

The structure of the data dictionary varies between database systems. For example, in many relational databases the structure of the user's data is stored in two centralized data dictionary tables: SYSTABLES and SYSCOLs. (A data file can have many fields or attributes.) The structure of an index, e.g., position in the table, length and number of fields, is stored in a separate centralized table, i.e., SYSINDEX. Users need not be concerned with the existence of an index, because the database system will automatically use an appropriate index access based upon the information stored in the SYSKEYS table and the nature of the inquiry. For example, if an index exists for an employee's name and an inquiry is made for an employee's name, the database manager will automatically choose that index.

While all databases hide the storage requirement details from the application programmers and users, the degree to which a database is storage independent varies considerably between database products. Some network databases will not automatically choose an appropriate index to access the database; whereas, many relational database will. Many network databases will not permit you to add an index without recreating the database, even though the application programs need not be changed. On the other hand, many Relational databases, permit one to add or delete indexes at will.

4.2.4 Referential Integrity.
A database system can provide referential integrity by ensuring that prerequisites for a transaction are stored before a transaction is completed.  For example, a customer information/credit approval record must be stored before an invoice is processed.  A student row must be stored before a student can register for a class.

4.2.5 Domain Definitions, Check Constraints and Triggers

Domain Definitions, Check Constraints and Triggers provides for Data Validations independent of the application.
Traditionally, programmers have coded data validations within application programs, e.g., a QPA must range between zero and 4.0, all numeric fields must be stored with digits, etc. However, application program validations lose their effectiveness with use of 4th generation languages and FORMS.  Through domain definition, the database manager validates the data, rather than the application program.

Domain definition is made on a field-by-field basis. Domain Definition cannot validate the relationship between fields, e.g., Pay Class Administrative may have a yearly salary over $100,000, but Pay Class Faculty may not. While domain definition cannot validate fields between files, referential integrity (discussed previously) requires that prerequisite information must exist in one file before information in another file can be stored.

4.2.6 Better Security

Most operating system provide authentication and authorization security. Authentication security involves the verification of valid identity. An simple operating system identify may sound simple for accessing a file server at school or work support the access to files or print services, would you want to maintain operating user accounts for all Amazon web application users  to access the underlying applications and database table tables?

Authorization security may manage file and directory security for authenticated users or groups. However, file security applies to the entire contents of a file or data set. For example, Windows NTFS file security will permit a user or group to open and modify a Word document. However, NTFS file security cannot restrict a user from seeing or altering paragraph 15 of this Word document.  Most DBMS store multiple database objects in the same file or data set.  Assume that a data file includes a DBMS employee table. This means that a user may have access to any employee information or may modified any information of employee.

An operating system provides security for a given logon user or group of users to access a file or a folder directory. But, within a database, we need a greater granularity of security. A database may represent only one VSAM data set, or DB2 table. Then the operating system could possible manage access. But, what if the requirements are to limit access to certain tables, or columns/rows with different types of access permissions, e.g., “you can look at it but you cannot change it”. The operating system is incapable of handling that task.

Traditional application program security loses its effectiveness with the use of 4th generation languages, forms, queries and reports. In addition we may have access to the database through a web interface, through an ODBC connection.

It is highly recommended that you create individual database user accounts for each person who will be accessing your database. It's technically possible to share accounts between users or simply use one user account for each type of user that needs to access your database, but I strongly discourage this practice for two reasons. First, it will eliminate individual accountability -- if a user makes a change to your database (let's say by giving himself a $5,000 raise), you won't be able to trace it back to a specific person through the use of audit logs. Furthermore, if a specific user leaves your organization and you wish to remove his or her access from the database, you'll be forced to change the password that all users rely upon.

However, if you have a large number of users, you'll most likely be overwhelmed by the burden of maintaining accounts and proper permissions. To ease this burden, relational databases support the notion of roles. User accounts are assigned to role(s) and permissions are then assigned to the role as a whole rather than the individual user accounts. For example, we could create a DBA role and then add the user accounts of our administrative staff to this role. Once we've done this, we can assign a specific permission to all present (and future) administrators by simply assigning the permission to the role.

Now that we've added users to our database, it's time to begin strengthening security by adding permissions. Our first step will be to grant appropriate database permissions to our users. We'll accomplish this through the use of the SQL GRANT statement.


Here's the syntax of the statement:

GRANT <permissions>
[ON <table>]
TO <user/role>
[WITH GRANT OPTION]

Now, let's take a look at this statement line-by-line. The first line, GRANT <permissions>, allows us to specify the specific table permissions we are granting. These can be either table-level permissions (such as SELECT, INSERT, UPDATE and DELETE) or database permissions (such as CREATE TABLE, ALTER DATABASE and GRANT). More than one permission can be granted in a single GRANT statement, but table-level permissions and database-level permissions may not be combined in a single statement.
It is possible for a user to have an operating system, database, application server and other authenticated identities. A centralized security server may be used to centralize user accounts and authorizations across multiple subsystems. Designing on the design of the security sever and subsystem security, e.g., Oracle security, each security system may be designed to customized the security requirements. Window’s security server is better known as Active Directory. Linux and UNIX’s security serve is called LDAP. IBM’s mainframe security server is called RACF and IBM third party provider called CA Technologies provides TopSecret or ACF. In some circumstances a cloud-based security services may be used, e.g., iCloud, Azure, Google, etc.

IBM’s RACF security server was introduced in 1965, Do you want to take guess what security server model that Microsoft used to develop Active Directory or the open source community used develop LDAP?  Do you want to take a second guess why Active Directory or LDAP are not permitted in sensitive most corporate, government, military and intelligence agencies?

4.2.7 DB2 Performance Tuning and Management Tools

DB2 has unique and mature tools, including graphical, workstation-based versions, to model, monitor and tune performance: VisualExplain, DB2 PM and OMEGAMON. Aided by these tools it is possible to do the tuning at various levels: system (bufferpool, EDM etc.), application, database or SQL level. Existing applications can be tuned to meet the growing demands of an organization: increase in the amount of the records to be processed, increase in the number of concurrent users etc. VSAM is not a subsystem and doesn't write performance records to SMF, which is used by performance monitoring tools. Performance of VSAM applications is heavily dependent on the initial design and there is very little scope of tuning later.

4.2.8 DB2 is web-enabled with built in Java support.
DB2 data can be accessed from various systems using standard TCP/IP, ODBC, X/Open CLI, JDBC and SQLJ. Products like Net.Data, Web forms, Web Focus provide rapid building of Web applications. Very strong, language independent, stored procedure support (including support for Java) allows business logic to be shared across all architectures. Thus you open the world of DB2 directly to the web for any application and usage. Generally VSAM data is accessible only by local applications and accessed via 3270 terminals only. For accessing VSAM from systems other than mainframe, custom interfaces needs to be built making it both an expensive and difficult option.

4.2.9 Sophisticated data backup, replication and recovery facilities
Unlike flat file-processing, databases store information in a series of related database files. If one database file is damaged, it may not be logically consistent with other database files. While database transactional processing (discussed previously) attempts to prevent an incomplete transaction from being stored, it does not correct damage to past stored transactions. In addition, database recovery systems need also to protect and to restore metadata and application data stored in various system tables.

Traditionally, flat file-processing systems have relied on batch backup strategies. Assume that you had successfully backed up your business data to a tape subsystem at the end of the prior business day. Prior to the end of the next business day and prior to the next scheduled backup your company's hard disk fails. The data from the previous backup tape must now be restored.

After the restoring the files, the users must now rekey all lost transactional data that has been entered since the last backup. In most cases, the time to rekey the lost transactional data may consume another business day. Also note the importance of maintaining an adequate paper transaction audit trail to retype this data, e.g., extra copies of invoices and receipts, paper vouchers, extra check copies, in accounting systems that rely of batch backup strategies.

Many sophisticated data entry systems attempt to minimize the time lost from retyping transactional data by storing a second copy of the transactional data in a transactional log (file) stored on tape or other secondary media. For example, as an accounts payable check is processed in an online environment all information concerning this check, e.g., debits, credits, invoices to be paid, amounts, etc. will also be stored in a transactional accounts payable log as well as updating the appropriate accounts payable ledger and journals. Existing accounting systems may have to be redesigned to include a transactional log. In addition, there is a need to develop software that will process inputs from the transactional log, rather processing data from the normal data entry method.

Many database products support a facility that will automatically store all database updates in a generic database transactional log. Most database transactional systems will also provide a utility program that may be used to restore transactional data from the database transactional log. The advantage of using a database transactional log is that the process of maintaining a backup transactional log is already available without little or no development time. But, not all companies use database software, nor do all database products support a database transactional log.

Therefore, the advantages of using accounting or database transactional logs are:

		1) To minimize the need for duplicated paper trails, and
		2) To minimize the time required to retype the transactional data since the last successful backup.

In spite of these advantages the time required to restore the transactional data to the current files may still exceed an hour. On the other hand, since accounting and database transactional logs are not used to backup word processing documents, spreadsheets and other non-transactional files, e.g., programs under development, significant retyping time may still be required to retype these entities as well.

Common Reasons to repair a database:
	Overwritten database files
	Damaged files and folder structure
	Permission based failures and lost passwords
	Backup or system restoration failure
	Accidentally deleted records
	Virus corruption and data integrity loss
	Accidently deleted tables
	Server crash or physical drive failure
	Offsite backup failure
	Locked database files preventing data access
	Disk surface contamination preventing start up.
	Water or fire damage
	Server crash
	Inaccessible drive or partition
	File corruption
	Data forensics and investigative services due to employee sabotage


4.2.10 DB2 Backup and Recovery

	A DB2 table is always recoverable in CICS, IMS/DC or RRS environment on OS/390. VSAM can only be recovered in CICS or IMS/DC environments. In batch, VSAM does not log its updates and therefore does not provide a facility to back out of changes. So to recover your VSAM data in batch, you must restore it back to the last backup.

	DB2 has a recovery utility. Data can be recovered from log, backup copy or both. Recovery is better managed and faster, as backup data is also maintained by DB2 and taken in DB2 format. In DB2, data recovery will be faster for partitioned tables as the partitions could be recovered in parallel. In the case of VSAM, data can only be recovered from a backup copy. The utilities ADDRSSU or IDCAMS can be used to take timely backups and restores when required. But maintaining different versions of backup is a manual process.

	In DB2, partitioned tables can be backed up in parallel for all partitions thereby reducing the time taken to complete the backup. Also, the incremental backup of DB2 reduces the time needed for backups. DB2 database utility Concurrent Copy improves data availability. The VSAM dataset cannot be backed up in parallel and hence needs more time for backup processes. Moreover, there is no concept of incremental backup for VSAM datasets.

4.2.11 Database Replication

Database replication is a way of keeping data synchronized in multiple databases. Implementing and maintaining replication might not be a simple proposition: If you have numerous database servers that need to be involved in various types of replication, a simple task can quickly become complex. Implementing replication can also be complicated by the application architecture.

One or more servers that get data and/or transactions from the publisher are called subscribers. Replication is managed by the system database, which by default is called distribution. A distribution database—which can reside on the publisher, subscriber, or on a separate server—is created when you configure replication.

The server that hosts the distribution database is referred to as the distribution server or distributor.

It is recommended that you always use a server that is dedicated to distributing transactions. Thus, the distribution server should be used for nothing but replication.

Each database server can act as a publisher and subscriber at the same time. Each publisher can have multiple subscribers, and each subscriber can receive transactions from multiple publishers.

You should also become familiar with replication agents, which are implemented as SQL Server jobs that perform a particular task according to their schedule.

What replication is not

	Replication is not a backup policy. A mistyped DELETE statement will be replicated on the slave too, and you could end up with two, perfectly synchronized, empty databases. Replication can help protect against hardware failure though.
	Replication is not an answer to all performance problems. Although updates on the slave are more optimized than if you ran the updates normally, databases under high-load could still struggle.
	Replication is not a guarantee that the slave will be in sync with the master at any one point in time. Even assuming the connection is always up, a busy slave may not yet have caught up with the master, so you can't simply interchange SELECT queries across master and slave servers.


4.2.12 DB2 Data Warehouse and DATA Mart Support

If the enterprise data server is in DB2, options exist for both batch as well as real-time updates to data targets (data warehouses, data marts, etc.) with continuous availability, making implementing a data warehouse solution a relatively straightforward option. If the enterprise data is in VSAM, building and maintaining a data warehouse is a multi-step process. This reduces the ability to exploit fully the wealth of information they contain.
DB2 data warehouse products include DB2 OLAP Server, Intelligent Miner, and Visual Warehouse. DB2 for OS/390 also has proven scalability to multiple terabytes of data and hundreds of concurrent users in data warehouse environments.

4.2.13 DB2 Support Stored Procedures, Triggers and User-defined Functions

DB2 supports stored procedures, triggers and user-defined functions in a number of languages, including Java. With each new release of DB2, performance improvements and additional features are available to the users.

4.2.13 DB2 Support for Data Types to Store Multimedia Data
DB2 includes a set of extended data types to be able to store, present, manipulate and search on a range of extended data types such as image, audio, video, extended text, geographical, and others. DB2 stores these as large objects BLOBs or CLOBs. The DB2 extenders give the added flexibility of keeping the content of a LOB in a file and pointing to it from the database.


4.3 CICS, VSAM and DB2

As stated previously a relational database is built on top of flat files, but a RDBMS provides important services to overcome the limitations of flat files. For example, most flat files do not support concurrency, or the ability of multi-users to share the same file.  Some flat files are dependent on the programming language used to create it.    However, IBM provides other alternatives to overcome these flat file limitations without requiring the use of RDMS.  Let’s review the previous graphic one more time.


Several M&T Bank IT VPs recently visited RMU and indicated that they used IBM’s Customer Information Control System (CICS) to support online and mobile banking applications and store the data in a VSAM data set. A student opened his mouth and stated that M&T was using the older VSAM technology instead of a modern relational database. I would recommend to others not to call a mission critical technology old-fashioned unless you know what you are talking about.  This is why you are requirement to understand the role of a RDBMS as a storage component in component in a computer system.

CICS is example of a user-interface application server similar to WebSphere, J2EE, ASP.NET, and many other application transactions servers.  But an application server is not responsible for storing the transaction data. M&T Bank online and mobile banking requirements require scale (high volume), efficiency, security, and low cost per transactions. No DBMS bells-and-whistles are required except data concurrency (data sharing) and transactional integrity.  CICS enables multiple CICS user transactions to share a VSAM data set with transactional integrity. If CICS provides these requirements, does one need the extra cost of an extra layer of middleware?

When I was young, my mother gave me sound advice based on the rabbit named Thumper, in the movie named Bambi. Thumper would say, “If you do not have something nice to say, do not say anything at all!” My advice to you is, “If you are not expert in a computer technology, do not say anything until you researched the technology.” A true computer professional knows when they do not know the answer. I am very comfortable with myself when I tell someone “I don’t know the answer. I will research the answer and get back to you.”   If you know all of the answers, I sure do not want you to work on my mission critical system.  After graduation, you will have given a degree which is a “License to Continue to Learn.”  The successful completion of this course means that have the right to continue to learn about DBMSs and have a successful opportunity to intelligently have an interview with entry-level DBMS recruiter. At that point, you will look better than most candidates.


4.4 DB2 Utilities


4.4.1 DB2I Primary Option Menu

DB2I is the primary utility panel used by DB2 application programmers. In this requirement you will learn how to use SPUFI.


DB2I primary option menu - https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/intro/src/tpc/db2z_db2iprimaryoptionmenu.html

The SSID: option displayed in your DB2I panel will be initially BLANK. The one illustrated below display DBAG for the z/OS subsystem ID


4.4.2. SPUFI

Video - Mainframe DB2 practical tutorial using SPUFI - https://www.youtube.com/watch?v=JWaOG2vCGb0

SPUFI (SQL Processing Using File Input) is a DB2 utility that permits you to execute one or more SQL statements stored in a PDS Member. SPUFI will use the ISPF editor either to create and edit a new PDS member to store SQL statements or to edit an existing PDS Member before execution. After the SQL statements will store the results in a sequential data set where you can review the execution messages and errors, or view the error messages within SPUFI

In the following example, the SQL commands stored in  LAB.SQLWORK(CREASAMP) will be executed  in a batch mode.   SPUFI enables on to execute SQL commands in a batch mode. SQL is NOT a programming language, but rather a relational data base access and management language. You submit and compile a COBOL batch program. One may use SPUFI to execute SQL batch statements. You do not compile SQL statements.

You may execute SQL statements interactively using QMF Query Interface Facility). The SPUFI panel listed below will execute SQL commands in the data set member LAB.SQLWORK(CREASAMP)


The ISPF EDIT panel will display the SQL code that will be executed.  You may edit the SQL code before submitting the job.


4.4.3 QMF (Query Management Facility)

Video - Mainframe DB2 practical video using QMF- https://www.youtube.com/watch?v=Ey57wvQ_ZwU
What is QMF?  -  http://www.mainframes360.com/2009/04/db2-qmf-how-to.html

QMF(Query Management Facility) tool used to issue interactue SQL Queries against the IBM DB2 Database System. It is not merely a Data-extraction tool, the data-results can be formatted and presented as Reports, Charts etc. thus offering valuable insight. QMF also has Wizards that can help you to extract data, without even having to write SQL Statements. QMF executes SQL statements one at a time.

QMF Home Panel

From ISPF Primary Option Menu enter QM and press the right-ctrl key. You must exit DB2 Interactive and return to ISPF by pressing F3 Exit.  You cannot start QMF from DB2I.  Once QMF is initialize you will see the following prompt.


Press the right-CTRL key to continue. The DB2 Home Panel will appear.


Press F6 (Query) to start editing a SQL command.


The following is the QMF SQL QUERY Panel.


Anywhere before the ***END *** tag type the following simple SQL statement:


To execute this statement Press F2 Run.   The results will be displayed in the QMF Report Panel. Similar to the ISPF editor you can shift the output to the left (F10) and to the right (F11), or forward (F8) or backward (F7)


To exit QMF press F3 End.  To re-edit the SQL command press F6 Query.


4.5 DB2 COBOL Application Programming
4.5.1 Embedded SQL

Embedded SQL statements are used in COBOL programs to perform standard SQL operations. Embedded SQL statements are preprocessed by SQL processor before the application program is compiled. COBOL is known as the Host Language. COBOL-DB2 applications are those applications that include both COBOL and DB2.

Embedded SQL statements work like normal SQL statements with some minor changes. For example, that output of a query is directed to a predefined set of variables which are referred as Host Variables. An additional INTO clause is placed in the SELECT statement.

Following are rules to be followed while coding a COBOL-DB2 program:

	All the SQL statements must be delimited between EXEC SQL and END-EXEC.
	SQL statements must be coded in Area B.
	All the tables that are used in a program must be declared in the Working-Storage Section.
	All SQL statements other than INCLUDE and DECLARE TABLE must appear in the Procedure Division.

4.5.2 Host Variables

Host variables are used for receiving data from a table or inserting data in a table. Host variables must be declared for all values that are to be passed between the program and the DB2. They are declared in the Working-Storage Section.

Host variables cannot be group items, but they may be grouped together in host structure. They cannot be Renamed or Redefined. Using host variables with SQL statements, prefix them with a colon (:).

4.5.3 SQLCA

SQLCA is a SQL communication area through which DB2 passes the feedback of SQL execution to the program. It tells the program whether an execution was successful or not. There are a number of predefined variables under SQLCA like SQLCODE which contains the error code. The value '000' in SQLCODE states a successful execution.


4.5.4 Sample COBOL DB2 API. Program

The PROCEDURE DIVISION


5.0 Hierarchical Databases and IMS

5.1 Introduction to Hierarchical databases

Hierarchical databases are the oldest and still most popular database models in use today. The hierarchical databases use a tree (not a binary tree) as its basic data structure organizing data in a hierarchy of nodes (records). Objects or Entities for a particular record-type are called segments. Each database may contain several tightly related objects. For any segment or record in the data base there is a single path called the hierarchical path. In other words, each file or segment should only be designed with one parent. Using the following hierarchical database design one must know the BRANCHID to look up the customer's ACCOUNT.

The BANK Database


The BANK Physical Database (PDBs ) would be defined similar to the following:

	DBD 		NAME=BANKDB,ACCESS=HDAM.... {HDAM is the data structure}
	DATASET   (details of requirements of the host operating system)
	SEGM		NAME=BRANCH,BYTES=40,... PARENT=0 {the root}
	FIELD		NAME=(BRANCHID,SEQ,U),BYTES=5,START=1,TYPE=C
	FIELD		NAME=BRANCHNAME,BYTES=35,START=6,TYPE=C
	SEGM		NAME=TELLER,BYTES=35,PARENT=BRANCH
	FIELD		NAME=(TELLERID,SEQ,U),BYTES=5,START=1,TYPE=C
	FIELD		NAME=TELLERNAME,BYTES=30,START=6,TYPE=C
	SEGM		NAME=CUSTOMER,BYTES=35,PARENT=BRANCH
	FIELD		NAME=(CUSTOMERID,SEQ,U),BYTES=5,START=1,TYPE=C
	FIELD		NAME=CUSTOMERNAME,BYTES=30,START=6,TYPE=C
	SEGM		NAME=LOANS,BYTES=17,PARENT=CUSTOMER
	FIELD		NAME=(LOANID,SEQ,U),BYTES=5,START=1,TYPE=C
	FIELD		NAME=LOANAMOUNT,BYTES=4,START=6,TYPE=P
	FIELD		NAME=LOANBALANCE,BYTES=4,START=11,TYPE=P
	FIELD		NAME=LOANINTEREST,BYTES=4,START=16,TYPE=P
	SEGM		NAME=ACCOUNTS,BYTES=9,PARENT=CUSTOMER
	FIELD		NAME=(ACCOUNTID,SEQ,U),BYTES=5,START=1,TYPE=C
	FIELD		NAME=ACCOUNTAMOUNT,BYTES=4,START=6,TYPE=P
	FIELD		NAME=ACCOUNTBALANCE,BYTES=4,START=11,TYPE=P


The above Physical Database provides the full description of the tree. Each record type in the database is called a segment, e.g., BRANCH, TELLER, CUSTOMER, etc. In the early days, a hierarchical database the database was simply a modified sequential data file (HSAM) that could store various records or segments of different types.

Later, the indexed-sequential access method and hashing techniques were used to provide quicker access to the root node, but not to the children segments. While the original hierarchical database storage system stored data contiguously, pointers were eventually used to store data in a noncontiguous fashion. The original hierarchical database provided excellent performance for batch processing, but provided terrible performance for online, real time applications.

Eventually, database design concepts encouraged hierarchical databases to be separated into multiple physical databases, e.g., BRANCH and CUSTOMER, instead of one large multi-record database, e.g., BANK. A Logical Database could then be defined to share data between segments of two or more physical databases through the use of inter-database pointers.

Originally, Programmers embedded DL1 commands into a host program, e.g., a COBOL program. Within the program, each user's view of the database (external schema) is defined in a Program Communication Block PSB). Today, SQL commands may be used to access an IMS database through a previously defined PCB.

5.2 Introduction to IMS

MS stands for Information Management System. IMS was developed by IBM with Rockwell and Caterpillar in year 1966 for the Apollo program to send a man to the moon. It started the database management system revolution and still continues to evolve to meet data processing requirements.

IMS provides an easy-to-use, reliable, and standard environment for executing high-performance transactions. IMS database is used by high-level programming languages like COBOL to store data in hierarchical arrangement and access it.

An IMS Database is a collection of correlated data items. These data items are organized and stored in a manner to provide fast and easy access. IMS database is a hierarchical database where data is stored at different levels and each entity is dependent on higher level entities. The physical elements on an application system that use IMS are shown in the following figure.


5.2.1 IMS Database Management

A Database Management system is a set of application programs used for storing, accessing, and managing data in the database. IMS database management system maintains integrity and allows fast recovery of data by organizing it in such a way that it is easy to retrieve. IMS maintains a large amount of world's corporate data with the help of its database management system.


5.2.3 IMS Transaction Manager

The function of transaction manager is to provide a communication platform between the database and the application programs. IMS acts as a transaction manager. A transaction manager deals with the end-user to store and retrieve data from the database. IMS can use IMS DB or DB2 as its back-end database to store the data.

5.2.3 DL/I – Data Language Interface

DL/I comprises of application programs that grant access to the data stored in the database. IMS DB uses DL/I which serves as the interface language that programmers use for accessing the database in an application program. We will discuss this in more detail in the upcoming chapters.

Advantages of IMS

	IMS supports applications from different languages such as COBOL, Java and XML.
	IMS applications and data can be accessed over any platform.
	IMS DB processing is very fast as compared to DB2.

Limitations of IMS

	Implementation of IMS DB is very complex.
	IMS predefined tree structure reduces flexibility.
	IMS DB is difficult to manage.


IMS Hierarchical Structure

IMS DB  Structure - https://www.tutorialspoint.com/ims_db/ims_db_structure.htm

An IMS database is a collection of data accommodating physical files. In a hierarchical database, the topmost level contains the general information about the entity. As we proceed from the top level to the bottom levels in the hierarchy, we get more and more information about the entity. Each level in the hierarchy contains segments. In standard files, it is difficult to implement hierarchies but DL/I supports hierarchies. The following figure depicts the structure of IMS DB.


IMS Segment

	A segment is created by grouping of similar data together. It is similar to table row or flat record
	It is the smallest unit of information that DL/I transfers to and from an application program during any input-output operation.
	A segment can have one or more data fields grouped together.

 In the following example, the segment Student has four data fields.


IMS Field

	A field is a single piece of data in a segment. For example, Roll Number, Name, Course, and Mobile Number are single fields in the Student segment.
	A segment consists of related fields to collect the information of an entity.
	Fields can be used as a key for ordering the segments.
	Fields can be used as a qualifier for searching information about a particular segment.

Segment Type

Segment Type is a category of data in a segment.
A DL/I database can have 255 different segment types and 15 levels of hierarchy.
In the following figure, there are three segments namely, Library, Books Information, and Student Information.


Segment Occurrence

Points to note:

    A segment occurrence is an individual segment of a particular type containing user data. In the above example, Books Information is one segment type and there can any number of occurrences of it, as it can store the information about any number of books.

    Within the IMS Database, there is only one occurrence of each segment type, but there can be an unlimited number of occurrences of each segment type.


IMS DB - DL/I Terminology

IMS DB- DL/I Terminology - https://www.tutorialspoint.com/ims_db/ims_db_dli_terminology.htm

Hierarchical databases work on the relationships between two or more segments. The following example shows how segments are related to each other in the IMS database structure.