DBMS/U3 Topic 2 Firewalls and Database Recovery
A firewall is a device installed between the internet network of an organization and the rest of Internet. When a computer is connected to Internet, it can create many problems for corporate companies. Most companies put a large amount of confidential information online. Such an information should not be disclosed to the unauthorized persons. Second problem is that the virus, worms and other digital pests can breach the security and can destroy the valuable data.
The main purpose of a firewall is to separate a secure area from a less secure area and to control communications between the two. Firewall also controlling inbound and outbound communications on anything from a single machine to an entire network.
On the Other Hand Software firewalls, also sometimes called personal firewalls, are designed to run on a single computer. These are most commonly used on home or small office computers that have broadband access, which tend to be left on all the time.
A software firewall prevents unwanted access to the computer over a network connection by identifying and preventing communication over risky ports. Computers communicate over many different recognized ports, and the firewall will tend to permit these without prompting or alerting the user.
A firewall can serve the following functions:
1. Limit Internet access to e-mail only, so that no other types of information can pass between the intranet and the Internet
2. Control who can telnet into your intranet (a method of logging in remotely
3. Limit what other kinds of traffic can pass between your intranet and the Internet.
A firewall can be simple or complex, depending on how specifically you want to control your Internet traffic. A simple firewall might require only that you configure the software in the router that connects your intranet to your ISP. A more complex firewall might be a computer running UNIX and specialized software.
Firewall systems fall into two categories
It can be used as packet filter. These firewalls examine only the headers of each packet of information passing to or from the Internet. The firewall accepts or rejects packets based on the packet’s sender, receiver, and port. For example, the firewall might allow e-mail and Web packets to and from any computer on the intranet, but allow telnet (remote login) packets to and from only selected computers.
Packet filter firewall maintains a filtering table that decides which packets are to be forwarded or discarded. A packet filter firewall filters at the network or transport layer.
As shown in fig. the packets are filtered according to following specifications:
- Incoming packets from network 220.127.116.11 are block (* means any).
- Incoming packets destined for any internal TELNET server (port 23) are blocked.
- Incoming packets for internal host 18.104.22.168.8 are blocked.
- Outgoing packets destined for an HTTP server (port 80) are blocked i.e. employees of organization are not allowed to browse the internet and cannot send any HTTP request.
These firewalls handle packets for each Internet service separately, usually by running a program called a proxy server, which accepts e-mail, Web, chat, newsgroup, and other packets from computers on the intranet, strips off the information that identifies the source of the packet, and passes it along to the Internet.
When the replies return, the proxy server passes the replies back to the computer that sent the original message. A proxy server can also log all the packets that pass by, so that you have a record of who has access to your intranet from the Internet, and vice versa.
DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or crashes amid transactions, it is expected that the system would follow some sort of algorithm or techniques to recover lost data.
To see where the problem has occurred, we generalize a failure into various categories, as follows −
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is called transaction failure where only a few transactions or processes are hurt.
Reasons for a transaction failure could be:
- Logical errors: Where a transaction cannot complete because it has some code error or any internal error condition.
- System errors: Where the database system itself terminates an active transaction because the DBMS is not able to execute it, or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability, the system aborts an active transaction.
There are problems − external to the system − that may cause the system to stop abruptly and cause the system to crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which destroys all or a part of disk storage.
We have already described the storage system. In brief, the storage structure can be divided into two categories −
- Volatile storage− As the name suggests, a volatile storage cannot survive system crashes. Volatile storage devices are placed very close to the CPU; normally they are embedded onto the chipset itself. For example, main memory and cache memory are examples of volatile storage. They are fast but can store only a small amount of information.
- Non-volatile storage− These memories are made to survive system crashes. They are huge in data storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various files opened for them to modify the data items. Transactions are made of various operations, which are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
- It should check the states of all the transactions, which were being executed.
- A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of the transaction in this case.
- It should check whether the transaction can be completed now or it needs to be rolled back.
- No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining the atomicity of a transaction −
- Maintaining the logs of each transaction, and writing them onto some stable storage before actually modifying the database.
- Maintaining shadow paging, where the changes are done on a volatile memory, and later, the actual database is updated.
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the logs are written prior to the actual modification and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows:
- The log file is kept on a stable storage media.
- When a transaction enters the system and starts execution, it writes a log about it.
- When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
- When the transaction finishes, it logs −
The database can be modified using two approaches:
- Deferred database modification: All logs are written on to the stable storage and the database is updated when a transaction commits.
- Immediate database modification: Each log follows an actual database modification. That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it would become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation, most modern DBMS use the concept of ‘checkpoints’.
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in consistent state, and all the transactions were committed.
When a system with concurrent transactions crashes and recovers, it behaves in the following manner:
- The recovery system reads the logs backwards from the end to the last checkpoint.
- It maintains two lists, an undo-list and a redo-list.
- If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in the redo-list.
- If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and their previous logs are removed and then redone before saving their logs.