Foundations of Business Intelligence: Databases and Information Management
An effective information system provides users with accurate, timely, and relevant information. Accurate information is free of errors. Information is timely when it is available to decision makers when it is needed.
File Organization Terms and Concepts
A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to fields, records, files, and databases.
Problems with The Traditional File Environment
In most organizations, systems tended to grow independently without a company-wide plan. Accounting, finance, manufacturing, human resources, and sales and marketing all developed their own systems and data files.
Fig. Traditional File Processing
Data Redundancy and Inconsistency
Data redundancy is the presence of duplicate data in multiple data files so that the same data are stored in more than place or location. Data redundancy wastes storage resources and also leads to data inconsistency, where the same attribute may have different values.
Program-data dependence refers to the coupling of data stores in files and the specific programs required to update and maintain those files such that changes in programs require changes to the data.
Lack of Flexibility
A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion.
Because there is little control or management of data, access to and dissemination of information may be out of control. Management may have no way of knowing who is accessing or even making changes to the organization’s data.
Lack of Data Sharing and Availability
Because pieces of information in different files and different parts of the organization cannot be related to one another, it is virtually impossible for information to be shared or accessed in a timely manner.
The Database Approach To Data Management
Database Management Systems
A database management systems (DBMS) is software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs.
How a DBMS Solves the Problems of the Traditional File Environment
A DBMS reduces data redundancy and inconsistency by minimizing isolated files in which the same data are prepared. The DBMS may not enable the organization to eliminate data redundancy entirely, but it can help control redundancy.
Cotemporary DBMS use different database models to keep track of entities, attributes, and relationships. The most popular type of DBMS today for PCs as well as for larger computers and mainframes is the relational DBMS.
Fig. Relational Database Tables
Operations of a Relational DBMS
Relational database tables can be combined easily to deliver data required by users, provided that any two tables share a common data element.
An object-oriented DBMS stores the data and procedures that act on those data as objects that can be automatically retrieved and shared. Hybrid object-relational DBMS systems are now available to provide capabilities of both object-oriented and relational DBMS.
Databases in the Cloud
Cloud computing providers offer database management services, but these services typically have less functionally than their on-premises counterparts.
Capabilities of Database Management Systems
DBMS have a data definition capability to specify the structure of the content of the database. A data dictionary is an automated or manual file that stores definitions of data elements and their characteristics.
Querying and Reporting
Most DBMS have a specialized language called a data manipulation language that is used to add, change, delete, and retrieve the data in the database.
To create a database, you must understand the relationships among the data, the type of data that will be maintained in the database, how the data will be used, and how the organization will need to change to manage data from a company-wide perspective. The database requires both a conceptual design and a physical design.
Normalization and Entity-Relationship Diagrams
The process of creating small, stable, yet flexible and adaptive data structures from complex groups of data is called normalization. The relationship between the entities SUPPLIER, PART, LINE_ITEM, AND ORDER is called entity-relationship diagram.
FIG. Normalized Tables Created From Order
FIG. An Entity-Relationship Diagram
Using Databases to Improve Business Performance And Decision Making
Businesses use their databases to keep track of basic transactions, such as paying suppliers, processing orders, keeping track of customers, and paying employees. But they also need databases to provide information that will help the company run the business more efficiently, and help managers and employees make better decisions.
A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company.
Fig. Component Of A Data Warehouse
A data mart is a subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users.
Tools For Business Intelligence: Multidimensional Data Analysis and Data Mining
Online Analytical Processing (OLAP)
Online Analytical Processing (OLAP) supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. OLAP enables users to obtain online answers to ad hoc questions such as these in a fairly rapid amount of time, even when the data are stored in very large databases, such as sales figures for multiple years.
Data mining is more discovery-driven. Data mining provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior.
Text Mining and Web Mining
Text mining tools are now available to help businesses analyze these data. These tools are able to extract key elements from large unstructured data sets, discover patterns and relationships, and summarize the information. Web mining is the discovery and analysis of useful patterns and information form the World Wide Web. Businesses might turn to Web mining to help them understand customer behavior, evaluate the effectiveness of a particular Web site, or quantify the success of a marketing campaign.
Managing Data Resources
Setting up a database is only a start. In order to make sure that the data for your business remain accurate, reliable, and readily available to those who need it, your business will need special policies and procedures for data management.
Establishing An Information Policy
An information policy specifies the organization’s rules for sharing disseminating, acquiring, standardizing, classifying, and inventorying information. Data administration is responsible for the specific policies and procedures through which data can be managed as an organizational resource. Data governance used to describe many of these activities. Promoted by IBM, data governance deals with the policies and processes for managing the availability, usability, integrity, and security of the data employed in an enterprise, with special emphasis on promoting privacy, security, data quality, and compliance with government regulations.
Ensuring Data Quality
Analysis of data quality often begins with a data quality audit, which is a structured survey of the accuracy and level of completeness of the data in an information system. Data cleaning, also known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant.