Information assurance (IA) is the practice of protecting against and managing risk related to the use, storage and transmission of data and information systems. Information assurance processes typically ensure the following functions for data and associated information systems:
Availability ensures information is ready for use by those that are allowed to access it and at a required level of performance.
Integrity ensures that information and associated systems can only be accessed or modified by those authorized to do so.
Authentication ensures that users are who they say they are using methods such as individual user names, passwords, biometrics, digital certificates and security tokens.
Confidentiality limits access or places restrictions on information such as PII or classified corporate data.
Non-repudiation ensures that someone cannot deny an action, such as the receipt of a message or the authenticity of a statement or contract, because the system provides proof of the action.
Data protection is the process of safeguarding important information from corruption, compromise or loss.
The importance of data protection increases as the amount of data created and stored continues to grow at unprecedented rates. There is also little tolerance for downtime that can make it impossible to access important information.
Consequently, a large part of a data protection strategy is ensuring that data can be restored quickly after any corruption or loss. Protecting data from compromise and ensuring data privacy are other key components of data protection.
The term data protection is used to describe both the operational backup of data and business continuity/disaster recovery (BC/DR). Data protection strategies are evolving along two lines: data availability and data management.
Data availability ensures users have the data they need to conduct business even if the data is damaged or lost.
A key area on the data management side is data lifecycle management, which is the process of automating the movement of critical data to online and offline storage, and information lifecycle management, a comprehensive strategy for valuing, cataloging and protecting information assets from application and user errors, malware and virus attacks, machine failure, or facility outages and disruptions. More recently, data management has come to include finding ways to unlock business value from otherwise dormant copies of data for reporting, test/dev enablement, analytics and other purposes.
Storage technologies that can be used to protect data include a disk or tape backup that copies designated information to a disk-based storage array or a tape cartridge device so it can be safely stored. Mirroring can be used to create an exact replica of a website or files so they’re available from more than one place. Storage snapshots can automatically generate a set of pointers to information stored on tape or disk, enabling faster data recovery, while continuous data protection (CDP) backs up all the data in an enterprise whenever a change is made.
Cloud backup is becoming more prevalent. Organizations frequently move their backup data to public clouds or clouds maintained by backup vendors. These backups can replace on-site disk and tape libraries, or they can serve as additional protected copies of data.
Backup has traditionally been the key to an effective data protection strategy. Data was periodically copied, typically each night, to a tape drive or tape library where it would sit until something went wrong with the primary data storage. That’s when the backup data would be accessed and used to restore lost or damaged data.
Backups are no longer a stand-alone function. Instead, they’re being combined with other data protection functions to save storage space and lower costs.
Backup and archiving, for example, have been treated as two separate functions. Backup’s purpose was to restore data after a failure, while an archive provided a searchable copy of data. However, that led to redundant data sets. Today, there are products that back up, archive and index data in a single pass. This approach saves organizations time and cuts down on the amount of data in long-term storage.
Enterprise data protection strategies
Modern data protection for primary storage involves using a built-in system that supplements or replaces backups and protects against the following potential problems:
Media failure. The goal here is to make data available even if a storage device fails. Synchronous mirroring is one approach in which data is written to a local disk and a remote site at the same time. The write is not considered complete until a confirmation is sent from the remote site, ensuring that the two sites are always identical. Mirroring requires 100% capacity overhead.
RAID protection is an alternative that requires less overhead capacity. With RAID, physical drives are combined into a logical unit that’s presented as a single hard drive to the operating system. RAID enables the same data to be stored in different places on multiple disks. As a result, I/O operations overlap in a balanced way, improving performance and increasing protection.
RAID protection must calculate parity, a technique that checks whether data has been lost or written over when it’s moved from one storage location to another, and that calculation consumes compute resources.
The cost of recovering from a media failure is the time it takes to return to a protected state. Mirrored systems can return to a protected state quickly. RAID systems take longer because they must recalculate all the parity. Advanced RAID controllers don’t have to read an entire drive to recover data when doing a drive rebuild; they only need to rebuild the data that is on that drive. Given that most drives run at about one-third capacity, intelligent RAID can reduce recovery times significantly.
Erasure coding is an alternative to advanced RAID that’s often used in scale-out storage environments. Like RAID, erasure coding uses parity-based data protection systems, writing both data and parity across a cluster of storage nodes. With erasure coding, all the nodes in the storage cluster can participate in the replacement of a failed node, so the rebuilding process doesn’t get CPU-constrained and it happens faster than it might in a traditional RAID array.
Replication is another data protection alternative for scale-out storage. Data is mirrored from one node to another or to multiple nodes. Replication is simpler than erasure coding, but it consumes at least twice the capacity of the protected data.
Data corruption. When data is corrupted or accidentally deleted, snapshots can be used to set things right. Most storage systems today can track hundreds of snapshots without any significant effect on performance.
Storage systems using snapshots can work with key applications, such as Oracle and Microsoft SQL Server, to capture a clean copy of data while the snapshot is occurring. This approach enables frequent snapshots that can be stored for long periods of time.
When data becomes corrupted or is accidentally deleted, a snapshot can be mounted and the data copied back to the production volume, or the snapshot can replace the existing volume. With this method, minimal data is lost and recovery time is almost instantaneous.
Storage system failure. To protect against multiple drive failures or some other major event, data centers rely on replication technology built on top of snapshots.
With snapshot replication, only blocks of data that have changed are copied from the primary storage system to an off-site secondary storage system. Snapshot replication is also used to replicate data to on-site secondary storage that’s available for recovery if the primary storage system fails.
Full-on data center failure. Protection against the loss of a data center requires a full disaster recovery plan. As with the other failure scenarios, there are multiple options. Snapshot replication, where data is replicated to a secondary site, is one option. However, the cost of running a secondary site can be prohibitive.
Cloud services are another alternative. Replication and cloud backup products and services can be used to store the most recent copies of data that is most likely to be needed in the event of a major disaster, and to instantiate application images. The result is a rapid recovery in the event of a data center loss.
Data protection Trends
The latest trends in data protection policy and technology include the following:
Hyper-convergence. With the advent of hyper-convergence, vendors have started offering appliances that provide backup and recovery for physical and virtual environments that are hyper-converged, non-hyper-converged and mixed. Data protection capabilities integrated into hyper-converged infrastructure are replacing a range of devices in the data center.
Cohesity, Rubrik and other vendors offer hyper-convergence for secondary storage, providing backup, disaster recovery, archiving, copy data management and other nonprimary storage functions. These products integrate software and hardware, and they can serve as a backup target for existing backup applications in the data center. They can also use the cloud as a target and provide backup for virtual environments.
Ransomware. This type of malware, which holds data hostage for an extortion fee, is a growing problem. Traditional backup methods have been used to protect data from ransomware. However, more sophisticated ransomware is adapting to and circumventing traditional backup processes.
The latest version of the malware slowly infiltrates an organization’s data over time so the organization ends up backing up the ransomware virus along with the data. This situation makes it difficult, if not impossible, to roll back to a clean version of the data.
To counter this problem, vendors are working on adapting backup and recovery products and methodologies to thwart the new ransomware capabilities.
Copy data management. CDM cuts down on the number of copies of data an organization must save, reducing the overhead required to store and manage data and simplifying data protection. CDM can speed up application release cycles, increase productivity and lower administrative costs through automation and centralized control.
The next step with CDM is to add more intelligence. Companies such as Veritas Technologies are combining CDM with their intelligent data management platforms.
Disaster recovery as a service. DRaaS use is expanding as more options are offered and prices come down. It’s being used for critical business systems where an increasing amount of data is being replicated rather than just backed up.