Multi-Dimensional Data Model
A multidimensional model views data in the form of a data-cube. A data cube enables data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.
The dimensions are the perspectives or entities concerning which an organization keeps records. For example, a shop may create a sales data warehouse to keep records of the store’s sales for the dimension time, item, and location. These dimensions allow the save to keep track of things, for example, monthly sales of items and the locations at which the items were sold. Each dimension has a table related to it, called a dimensional table, which describes the dimension further. For example, a dimensional table for an item may contain the attributes item_name, brand, and type.
A multidimensional data model is organized around a central theme, for example, sales. This theme is represented by a fact table. Facts are numerical measures. The fact table contains the names of the facts or measures of the related dimensional tables.
Working on a Multidimensional Data Model
On the basis of the pre-decided steps, the Multidimensional Data Model works.
The following stages should be followed by every project for building a Multi Dimensional Data Model :
Stage 1: Assembling data from the client: In first stage, a Multi Dimensional Data Model collects correct data from the client. Mostly, software professionals provide simplicity to the client about the range of data which can be gained with the selected technology and collect the complete data in detail.
Stage 2: Grouping different segments of the system: In the second stage, the Multi Dimensional Data Model recognizes and classifies all the data to the respective section they belong to and also builds it problem-free to apply step by step.
Stage 3: Noticing the different proportions: In the third stage, it is the basis on which the design of the system is based. In this stage, the main factors are recognized according to the user’s point of view. These factors are also known as “Dimensions”.
Stage 4: Preparing the actual-time factors and their respective qualities: In the fourth stage, the factors which are recognized in the previous step are used further for identifying the related qualities. These qualities are also known as “attributes” in the database.
Stage 5: Finding the actuality of factors which are listed previously and their qualities: In the fifth stage, A Multi Dimensional Data Model separates and differentiates the actuality from the factors which are collected by it. These actually play a significant role in the arrangement of a Multi Dimensional Data Model.
Stage 6: Building the Schema to place the data, with respect to the information collected from the steps above: In the sixth stage, on the basis of the data which was collected previously, a Schema is built.
Data Cubes
In computer programming contexts, a data cube (or datacube) is a multi-dimensional (“n-D”) array of values. Typically, the term datacube is applied in contexts where these arrays are massively larger than the hosting computer’s main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data.
The data cube is used to represent data (sometimes called facts) along some dimensions of interest. For example, in OLAP such dimensions could be the subsidiaries a company has, the products the company offers, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In satellite image timeseries dimensions would be Latitude and Longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite (following some processing that is not of concern here). Even though it is called a cube (and the examples provided above happen to be 3-dimensional for brevity), a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only few values with the rest being empty, i.e. undefined, sometimes most or all cube coordinates hold a cell value. In the first case such data are called sparse, in the second case they are called dense, although there is no hard delineation between both.
Applications
Multi-dimensional arrays can meaningfully represent spatio-temporal sensor, image, and simulation data, but also statistics data where the semantics of dimensions is not necessarily of spatial or temporal nature. Generally, any kind of axis can be combined with any other into a datacube.
Stars
Star schema is the fundamental schema among the data mart schema and it is simplest. This schema is widely used to develop or build a data warehouse and dimensional data marts. It includes one or more fact tables indexing any number of dimensional tables. The star schema is a necessary cause of the snowflake schema. It is also efficient for handling basic queries.
It is said to be star as its physical model resembles to the star shape having a fact table at its center and the dimension tables at its peripheral representing the star’s points.
Advantages of Star Schema:
Simpler Queries
Join logic of star schema is quite cinch in comparison to other join logic which are needed to fetch data from a transactional schema that is highly normalized.
Simplified Business Reporting Logic
In comparison to a transactional schema that is highly normalized, the star schema makes simpler common business reporting logic, such as as-of reporting and period-over-period.
Feeding Cubes
Star schema is widely used by all OLAP systems to design OLAP cubes efficiently. In fact, major OLAP systems deliver a ROLAP mode of operation which can use a star schema as a source without designing a cube structure.
Disadvantages of Star Schema:
- Data integrity is not enforced well since in a highly de-normalized schema state.
- Not flexible in terms if analytical needs as a normalized data model.
- Star schemas don’t reinforce many-to-many relationships within business entities at least not frequently.
Snow Flakes
Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized which splits data into additional tables.
Characteristics of Snowflake:
- The main benefit of the snowflake schema it uses smaller disk space.
- Easier to implement a dimension is added to the Schema
- Due to multiple tables query performance is reduced
- The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables.
Fact Constellations
Fact constellation is a measure of online analytical processing, which is a collection of multiple fact tables sharing dimension tables, viewed as a collection of stars. It can be seen as an extension of the star schema.
A fact constellation schema has multiple fact tables. It is also known as galaxy schema. It is widely used schema and more complex than star schema and snowflake schema. It is possible to create fact constellation schema by splitting original star schema into more star schema. It has many fact tables and some common dimension table.
Advantage:
Provides a flexible schema.
Disadvantage:
It is much more complex and hence, hard to implement and maintain.