A proper database design cannot be thrown together quickly by novices. What is required is a practiced and formal approach to gathering data requirements and modeling data. This modeling effort requires a formal approach to the discovery and identification of entities and data elements. Data normalization is a big part of data modeling and database design. A normalized data model reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately.
So, database design is the process of transforming a logical data model into an actual physical database. Technicians sometimes leap to the physical implementation before producing the model of that implementation. This is unwise. A logical data model is required before you can even begin to design a physical database. And the logical data model grows out of a conceptual data model. And any type of data model begins with the discipline of data modeling.
The first objective of conceptual data modeling is to understand the requirements. A data model, in and of itself, is of limited value. Of course, a data model delivers value by enhancing communication and understanding, and it can be argued that these are quite valuable. But the primary value of a data model is its ability to be used as a blueprint to build a physical database.
When databases are built from a well-designed data model the resulting structures provide increased value to the organization. The value derived from the data model exhibits itself in the form of minimized redundancy, maximized data integrity, increased stability, better data sharing, increased consistency, more timely access to data, and better usability. These qualities are achieved because the data model clearly outlines the data resource requirements and relationships in a clear, concise manner. Building databases from a data model will result in a better database implementation because you will have a better understanding of the data to be stored in your databases.
Another benefit of data modeling is the ability to discover new uses for data. A data model can clarify data patterns and potential uses for data that would remain hidden without the data blueprint provided by the data model. Discovery of such patterns can change the way your business operates and can potentially lead to a competitive advantage and increased revenue for your organization.
Data modeling requires a different mindset than requirements gathering for application development and process-oriented tasks. It is important to think “what” is of interest instead of “how” tasks are accomplished. To transition to this alternate way of thinking, follow these three “rules”:
- Don’t think physical; think conceptual – do not concern yourself with physical storage issues and the constraints of any DBMS you may know. Instead, concern yourself with business issues and terms.
- Don’t think process; think structure – how something is done, although important for application development, is not important for data modeling. The things that processes are being done to are what is important to data modeling.
- Don’t think navigation; think relationship – the way that things are related to one another is important because relationships map the data model blueprint. The way in which relationships are traversed is unimportant to conceptual and logical data modeling.
Data models are typically rendered in a graphical format using an entity-relationship diagram, or E/R diagram for short. An E/R diagram graphically depicts the entities and relationships of a data model. There are many popular data modeling tools on the market from a variety of vendors. But do not confuse the tool as being more important than the process. Of what use is a good tool if you do not know how to deploy it?
A data model is built using many different components acting as abstractions of real world things. The simplest data model will consist of entities and relationships. As work on the data model progresses, additional detail and complexity is added. Let’s examine the many different components of a data model and the terminology used for data modeling.
The first building block of the data model is the entity. An entity, at a very basic level, is something that exists and is capable of being described. It is a person, place, thing, concept, or event about which your organization maintains facts. For example: “STUDENT,” “INSTRUCTOR,” and “COURSE” are specific entities about which a college or university must be knowlegeable to perform its business.
Entities are comprised of attributes. An attribute is a characteristic of an entity. Every attribute does one of three things:
- Describe – An attribute is descriptive if it does not identify or relate, but is used to depict or express a characteristic of an entity occurrence.
- Identify – An attribute that identifies is a candidate key. If the value of an identifying attribute changes, it should identify a different entity occurrence. An attribute that identifies should be unchangeable and immutable.
- Relate – An attribute that relates entities is a foreign key; the attribute refers to the primary key attribute of an occurrence of another (or the same) entity.
Each attribute is assigned a domain that defines the type of data, its size, and the valid values that can be assigned to the attribute. As a general rule of thumb, nouns tend to be entities and adjectives tend to be attributes. But, of course, this is not a hard and fast rule: be sure to apply of the business to determine which nouns and attributes are entities and which are attributes. Every attribute must either identify the entity occurrence, describe the entity occurrence, or relate the entity occurrence to another entity occurrence (in the same or another entity).
Relationships define how the different entities are associated with each other. Each relationship is named such that it describes the role played by an entity in its association with another (or perhaps the same) entity. A relationship is defined by the keys of the participating entities: the primary key in the parent entity and the foreign key in the dependent entity. Relationships are not just the “lines” that connect entities, but provide meaning to the data model and must be assigned useful names.
Keep in mind that as you create your data models, you are developing the lexicon of your organization’s business. Much like a dictionary functions as the lexicon of words for a given language, the data model functions as the lexicon of business terms and their usage. Of course, this short introduction just scrapes the tip of the data modeling iceberg.
Assuming that the logical data model is complete, though, what must be done to implement a physical database?
The first step is to create an initial physical data model by transforming the logical data model into a physical implementation based on an understanding of the DBMS to be used for deployment. To successfully create a physical database design you will need to have a good working knowledge of the features of the DBMS including:
- In-depth knowledge of the database objects supported by the DBMS and the physical structures and files required to support those objects.
- Details regarding the manner in which the DBMS supports indexing, referential integrity, constraints, data types, and other features that augment the functionality of database objects.
- Detailed knowledge of new and obsolete features for particular versions or releases of the DBMS to be used.
- Knowledge of the DBMS configuration parameters that are in place.
- Data definition language (DDL) skills to translate the physical design into actual database objects.
Armed with the correct information, you can create an effective and efficient database from a logical data model. The first step in transforming a logical data model into a physical model is to perform a simple translation from logical terms to physical objects. Of course, this simple transformation will not result in a complete and correct physical database design – it is simply the first step. The transformation consists of the following:
- Transforming entities into tables
- Transforming attributes into columns
- Transforming domains into data types and constraints
To support the mapping of attributes to table columns you will need to map each logical domain of the attribute to a physical data type and perhaps additional constraints. In a physical database, each column must be assigned a data type. Certain data types require a maximum length to be specified. For example a character data type could be specified as CHAR(25), indicating that up to 25 characters can be stored for the column. You may need to apply a length to other data types as well, such as graphic, floating point, and decimal (which require a length and scale) types.
But no commercial DBMS product fully supports relational domains. Therefore the domain assigned in the logical data model must be mapped to a data type supported by the DBMS. You may need to adjust the data type based on the DBMS you use. For example, what data type and length will be used for monetary values if no built-in currency data type exists? Many of the major DBMS products support user-defined data types, so you might want to consider creating a data type to support the logical domain, if no built-in data type is acceptable.
In addition to a data type and length, you also may need to apply a constraint to the column. Consider a domain of integers between 1 and 10 inclusive. Simply assigning the physical column to an integer data type is insufficient to match the domain. A constraint must be added to restrict the values that can be stored for the column to the specified range, 1 through 10. Without a constraint, negative numbers, zero, and values greater than ten could be stored. Using check constraints you can place limits on the data values that can be stored in a column or set of columns.
Specification of a primary key is an integral part of the physical design of entities and attributes. A primary key should be assigned for every entity in the logical data model. As a first course of action you should try to use the primary key as selected in the logical data model. However, multiple candidate keys often are uncovered during the data modeling process. You may decide to choose a primary key other than the one selected during logical design – either one of the candidate keys or another surrogate key for physical implementation. But even if the DBMS does not mandate a primary key for each table it is a good practice to identify a primary key for each physical table you create. Failure to do so will make processing the data in that table more difficult.
Of course, there are many other decisions that must be made during the transition from logical to physical. For example, each of the following must be addressed:
- The nullability of each column in each table
- For character columns, should fixed length or variable length be used?
- Should the DBMS be used to assign values to sequences or identity columns?
- Implementing logical relationships by assigning referential constraints
- Building indexes on columns to improve query performance
- Choosing the type of index to create: b-tree, bit map, reverse key, hash, partitioning, etc.
- Deciding on the clustering sequence for the data
- Other physical aspects such as column ordering, buffer pool specification, data files, denormalization, and so on.
A logical data model should be used as the blueprint for designing and creating a physical database. But the physical database cannot be created properly with a simple logical to physical mapping. Many physical design decisions need to be made by the DBA before implementing physical database structures. This may necessitate deviating from the logical data model. But such deviation should occur only based on in-depth knowledge of the DBMS and the physical environment in which the database will exist.