In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a complete copy of the actual data, or only select pieces of data that allow detection of errors and reconstruction of lost or damaged data up to a certain level.
Data redundancy is a condition created within a database or data storage technology in which the same piece of data is held in two separate places.
This can mean two different fields within a single database, or two different spots in multiple software environments or platforms. Whenever data is repeated, it basically constitutes data redundancy.
Data redundancy can occur by accident but is also done deliberately for backup and recovery purposes
For example, by including additional data checksums, ECC memory is capable of detecting and correcting single-bit errors within each memory word, while RAID 1 combines two hard disk drives (HDDs) into a logical storage unit that allows stored data to survive a complete failure of one drive. Data redundancy can also be used as a measure against silent data corruption; for example, file systems such as Btrfs and ZFS use data and metadata checksumming in combination with copies of stored data to detect silent data corruption and repair its effects.
While different in nature, data redundancy also occurs in database systems that have values repeated unnecessarily in one or more records or fields, within a table, or where the field is replicated/repeated in two or more tables. Often this is found in Unnormalized database designs and results in the complication of database management, introducing the risk of corrupting the data, and increasing the required amount of storage. When done on purpose from a previously normalized database schema, it may be considered a form of database denormalization; used to improve performance of database queries.
For instance, when customer data are duplicated and attached with each product bought, then redundancy of data is a known source of inconsistency since a given customer might appear with different values for one or more of their attributes. Data redundancy leads to data anomalies and corruption and generally should be avoided by design; applying database normalization prevents redundancy and makes the best possible usage of storage.
Redundancy means having multiple copies of same data in the database. This problem arises when a database is not normalized. Suppose a table of student details attributes are: student Id, student name, college name, college rank, course opted.
ID | Name | Contact | College | Courses | Rank |
200 | ABCD | 123 | AKTU | MBA | 1 |
201 | PQRS | 321 | AKTU | MBA | 1 |
202 | WXYZ | 456 | AKTU | MBA | 1 |
203 | MNOP | 654 | AKTU | MBA | 1 |
204 | GHIJ | 789 | AKTU | MBA | 1 |
Problems
Updation Anomaly: Suppose if the rank of the college changes then changes will have to be all over the database which will be time-consuming and computationally costly.
Deletion Anomaly: If the details of students in this table is deleted then the details of college will also get deleted which should not occur by common sense.
This anomaly happens when deletion of a data record results in losing some unrelated information that was stored as part of the record that was deleted from a table.
It is not possible to delete some information without losing some other information in the table as well.
Insertion Anomaly: If a student detail has to be inserted whose course is not being decided yet then insertion will not be possible till the time course is decided for student.
ID | Name | Contact | College | Courses | Rank |
200 | ABCD | 123 | AKTU | 1 |
This problem happens when the insertion of a data record is not possible without adding some additional unrelated data to the record.