Tuning Data Warehouse, Testing Data Warehouse

22 Feb 202230 Sep 2025

Tuning Data Warehouse

A data warehouse keeps evolving and it is unpredictable what query the user is going to post in the future. Therefore, it becomes more difficult to tune a data warehouse system.

Difficulties in Data Warehouse Tuning

Tuning a data warehouse is a difficult procedure due to following reasons:

Data warehouse is dynamic; it never remains constant.
It is very difficult to predict what query the user is going to post in the future.
Business requirements change with time.
Users and their profiles keep changing.
The user can switch from one group to another.
The data load on the warehouse also changes with time.

Data Load Tuning

Data load is a critical part of overnight processing. Nothing else can run until data load is complete. This is the entry point into the system.

Note: If there is a delay in transferring the data, or in arrival of data then the entire system is affected badly. Therefore, it is very important to tune the data load first.

There are various approaches of tuning data load:

The very common approach is to insert data using the SQL Layer. In this approach, normal checks and constraints need to be performed. When the data is inserted into the table, the code will run to check for enough space to insert the data. If sufficient space is not available, then more space may have to be allocated to these tables. These checks take time to perform and are costly to CPU.

The second approach is to bypass all these checks and constraints and place the data directly into the preformatted blocks. These blocks are later written to the database. It is faster than the first approach, but it can work only with whole blocks of data. This can lead to some space wastage.

The third approach is that while loading the data into the table that already contains the table, we can maintain indexes.

The fourth approach says that to load the data in tables that already contain data, drop the indexes & recreate them when the data load is complete. The choice between the third and the fourth approach depends on how much data is already loaded and how many indexes need to be rebuilt.

Tuning Queries

Fixed queries
Ad hoc queries

Fixed Queries

Fixed queries are well defined. Following are the examples of fixed queries −

Regular reports
Canned queries
Common aggregations

Tuning the fixed queries in a data warehouse is same as in a relational database system. The only difference is that the amount of data to be queried may be different. It is good to store the most successful execution plan while testing fixed queries. Storing these executing plan will allow us to spot changing data size and data skew, as it will cause the execution plan to change.

Ad hoc Queries

To understand ad hoc queries, it is important to know the ad hoc users of the data warehouse. For each user or group of users, you need to know the following:

The number of users in the group
Whether they use ad hoc queries at regular intervals of time
Whether they use ad hoc queries frequently
Whether they use ad hoc queries occasionally at unknown intervals.
The maximum size of query they tend to run
The average size of query they tend to run
Whether they require drill-down access to the base data
The elapsed login time per day
The peak time of daily usage
The number of queries they run per peak hour

Testing Data Warehouse

Testing is very important for data warehouse systems for data validation and to make them work correctly and efficiently.

There are three basic levels of testing performed on data warehouse which are as follows:

Unit Testing:

This type of testing is being performed at the developer’s end. In unit testing, each unit/component of modules is separately tested. Each modules of the whole data warehouse, i.e. program, SQL Script, procedure, Unix shell is validated and tested.

Integration Testing:

In this type of testing the various individual units/ modules of the application are brought together or combined and then tested against the number of inputs. It is performed to detect the fault in integrated modules and to test whether the various components are performing well after integration.

System Testing:

System testing is the form of testing that validates and tests the whole data warehouse application. This type of testing is being performed by technical testing team. This test is conducted after developer’s team performs unit testing and the main purpose of this testing is to check whether the entire system is working altogether or not.

Challenges of data warehouse testing are:

Data selection from multiple source and analysis that follows pose great challenge.
Volume and complexity of the data, certain testing strategies are time consuming.
ETL testing requires hive SQL skills, thus it pose challenges for tester who have limited SQL skills.
Redundant data in a data warehouse.
Inconsistent and inaccurate reports.

ETL testing is performed in five stages:

Identifying data sources and requirements.
Data acquisition.
Implement business logic’s and dimensional modelling.
Build and populate data.
Build reports.

Tuning Data Warehouse, Testing Data Warehouse

Tuning Data Warehouse

Testing Data Warehouse

Like this:

Related

Leave a ReplyCancel reply

Guru Gobind Singh Indraprastha University (BBA) Notes

Chaudhary Charan Singh University BBA Notes (Old and New Syllabus)

Key difference between Memorandum and Articleas of Association, Prospectus

KMBN106 Design Thinking

CCSU(BBA) 401 Consumer Behavior

Advertising for Rural Markets, Objectives, Media, Strategies, Challenges

Rural Communication, Importance, Types, Traditional Media, Modern Media and Digital, Challenges

SHG (Self Help Groups) in Product Promotion, Objectives, Role, Marketing Strategies, Government Support

Agri-Tech Start-ups, Importance, Types, Technologies, Role, Challenges, Government Support and Funding

Importance of Branding, Packaging and Labelling in Rural Marketing

Tuning Data Warehouse

Testing Data Warehouse

Share this:

Like this:

Related

You might also like

Leave a ReplyCancel reply