Normalization is a systematic database design technique used to organize data in a relational database to reduce data redundancy and improve data integrity. The process involves decomposing (breaking down) a large, poorly structured table into smaller, well-structured ones based on a series of normal forms (1NF, 2NF, 3NF, BCNF, etc.). Each normal form addresses a specific type of anomaly or redundancy. The primary goal is to ensure that each data item is stored logically in only one place (except for foreign keys), which eliminates update, insertion, and deletion anomalies. By minimizing duplication and enforcing logical dependencies through relationships, normalization creates a robust, flexible, and efficient database structure that is easier to maintain and less prone to inconsistencies.
1. First Normal Form (1NF)
First Normal Form (1NF) is the basic level of normalization in a database that aims to eliminate repeating groups and ensure atomicity of data. A relation is in 1NF if each attribute contains only indivisible (atomic) values and each record is unique. This means there should be no multiple values or arrays in a single column. Each column should hold a single piece of information.
For example, consider a student table where the “Subjects” field contains “Math, English, Science.” To bring it into 1NF, each subject should be stored in a separate row.
Rules for 1NF:
-
Each column must have unique names.
-
Each cell must contain a single value.
-
The order of rows and columns doesn’t matter.
Purpose: 1NF reduces data redundancy and simplifies the structure for easier manipulation. However, it does not remove all anomalies like partial or transitive dependencies. Hence, higher normal forms are required for a more efficient design.
2. Second Normal Form (2NF)
Second Normal Form (2NF) is achieved when a table is already in First Normal Form (1NF) and all non-key attributes are fully functionally dependent on the entire primary key. It mainly applies to tables that have composite primary keys (keys made up of more than one column).
The goal of 2NF is to eliminate partial dependency, where a non-key attribute depends only on part of a composite key.
For example, consider a table with columns: (StudentID, CourseID, StudentName, CourseName). Here, StudentName depends only on StudentID, and CourseName depends only on CourseID. This violates 2NF. To achieve 2NF, the table should be split into two:
Student(StudentID, StudentName) and Course(CourseID, CourseName).
Purpose: 2NF ensures that each non-key attribute relates directly to the entire key, reducing redundancy and improving data consistency. However, transitive dependencies may still exist, which are addressed in 3NF.
3. Third Normal Form (3NF)
Third Normal Form (3NF) is achieved when a table is already in 2NF and no transitive dependency exists between non-key attributes. This means that a non-key attribute should not depend on another non-key attribute; it should depend only on the primary key.
For example, consider a table: (StudentID, StudentName, DepartmentID, DepartmentName). Here, DepartmentName depends on DepartmentID, not directly on StudentID. To achieve 3NF, we split the table into Student(StudentID, StudentName, DepartmentID) and Department(DepartmentID, DepartmentName).
Rules for 3NF:
-
The table must be in 2NF.
-
Every non-key attribute must depend only on the primary key.
Purpose: 3NF removes transitive dependencies, reduces data duplication, and maintains referential integrity. It results in a cleaner and more efficient database design. However, in some rare cases, further refinement through BCNF may still be required.
4. Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is an advanced version of the Third Normal Form (3NF) that handles certain anomalies not covered by 3NF. A table is in BCNF if it is in 3NF and for every functional dependency (X → Y), X should be a super key. This means no non-trivial dependency should exist where a non-super key determines another attribute.
For example, consider a table with columns (Professor, Subject, Department) where a professor teaches one subject, and each subject is assigned to one department. If a subject can be taught by only one professor, but a professor can teach multiple subjects, the dependency may violate BCNF.
To achieve BCNF, the table must be decomposed into smaller relations that eliminate such dependencies.
Purpose: BCNF ensures the highest level of data consistency, eliminating all redundancy and dependency anomalies. Though it may increase the number of tables, it provides a more logically sound and reliable database structure.
5. Fourth Normal Form (4NF)
Fourth Normal Form (4NF) deals with multi-valued dependencies in a table. A relation is in 4NF if it is already in Boyce-Codd Normal Form (BCNF) and has no multi-valued dependencies other than a candidate key. Multi-valued dependency occurs when one attribute in a table uniquely determines multiple values of another attribute independently of other attributes.
For example, consider a table (StudentID, Course, Hobby). A student can enroll in multiple courses and also have multiple hobbies, but these two attributes are independent of each other. This leads to data redundancy and anomalies. To achieve 4NF, we decompose the table into two: StudentCourse(StudentID, Course) and StudentHobby(StudentID, Hobby).
Purpose: 4NF eliminates unnecessary duplication caused by independent multi-valued facts. It ensures that each table represents one type of relationship. This leads to greater efficiency, easier maintenance, and consistent data storage, especially in complex databases dealing with multiple many-to-many relationships.
6. Fifth Normal Form (5NF)
Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), focuses on eliminating join dependency. A table is in 5NF if it is already in 4NF and cannot be decomposed into smaller tables without losing information or causing redundancy when joined back together. 5NF ensures that every join dependency in the table is a consequence of the candidate keys.
For example, consider a table (Supplier, Product, Project) where a supplier supplies products for different projects. Each combination is stored separately, but decomposing this into three tables — SupplierProduct(Supplier, Product), ProductProject(Product, Project), and SupplierProject(Supplier, Project) — removes redundancy while preserving all information.
Purpose: 5NF is crucial for complex databases that handle interrelated data. It guarantees the database is free from all redundancy and maintains data integrity during reconstruction through joins. While rarely needed in typical systems, 5NF is vital in advanced relational designs for ensuring perfect normalization.
The people who created this page is absolute legends !!
Brilliant initiative !!
A Great shout out for the whole team who’s behind this project
Thank u soo much guysss !!!