Data Warehousing is a system used to store large amounts of business data in one central place. It collects data from different sources such as sales systems, finance, marketing and customer databases. The data is cleaned, organized and stored in a structured format for easy analysis and reporting. A data warehouse supports decision making by providing historical and summarized information. It is mainly used for business intelligence and data analysis. Unlike operational databases, it focuses on analysis rather than daily transactions. In India, banks, retail companies and telecom firms use data warehouses for better planning and strategic decisions.
Functions of Data Warehousing:
1. Data Extraction
The first function of a data warehouse is to pull data from various heterogeneous source systems. These sources can include operational databases (OLTP systems like ERP, CRM), flat files (Excel, CSV), external data sources (market research data, social media feeds), and even legacy systems. In an Indian banking context, this means extracting data from savings accounts, credit card systems, loan processing systems, and ATM transaction logs. The extraction process must handle different data formats, platforms, and structures. This function is critical because it brings all relevant business data into one place, breaking down the organizational silos that typically exist in companies.
2. Data Cleansing and Transformation
Raw data extracted from source systems is rarely analysis-ready. It often contains errors, inconsistencies, missing values, and duplicate records. The data warehouse performs the crucial function of cleaning and transforming this data. For example, one system might store dates as “DD-MM-YYYY” while another uses “MM/DD/YYYY.” One system might record gender as “M/F” while another uses “Male/Female.” The transformation process standardizes these formats, handles missing values, corrects errors, and converts data into a consistent, unified structure. This function ensures that the data stored in the warehouse is reliable, accurate, and trustworthy—the foundation for meaningful business analysis.
3. Data Integration
A core function of the data warehouse is to integrate data from multiple disparate sources into a cohesive, unified view. Different departments within an organization often have their own systems that do not communicate with each other. The warehouse acts as the central meeting point. For a retail chain like Reliance Retail, the warehouse integrates sales data from stores, inventory data from warehouses, supplier data from procurement, and customer data from loyalty programs. This integration creates a single version of the truth—a holistic view of the business where relationships between different functions become visible. Without integration, the organization remains fragmented and unable to see the complete picture.
4. Data Storage and Organization
The warehouse provides a dedicated repository for storing large volumes of historical and current data in a structured manner. Unlike operational databases optimized for fast transaction processing, the warehouse is optimized for fast querying and analysis. It organizes data using specific schemas like Star Schema or Snowflake Schema, which separate quantitative data (facts like sales amount, quantity sold) from descriptive data (dimensions like product, customer, time). This dimensional modeling makes it intuitive for business users to explore data. The warehouse also maintains historical data, often spanning many years, allowing for trend analysis and time-series comparisons that are impossible with transactional systems that purge old data.
5. Data Summarization
To support fast query performance, data warehouses perform summarization or aggregation of data. Instead of forcing users to query millions of individual transaction records every time, the warehouse pre-calculates summary data at various levels of granularity. For example, sales data might be summarized by day, week, month, quarter, and year; by product category and sub-category; by region, state, and city. These pre-computed summaries (aggregates) allow business users to get answers almost instantly when they ask questions like “What were the total sales in Maharashtra last quarter?” This function dramatically improves query performance and user experience, making interactive analysis feasible.
6. Data Distribution and Metadata Management
A data warehouse does not just store data; it also distributes relevant data to different downstream systems and users. This includes feeding data to Data Marts (departmental subsets of the warehouse), sending summarized reports to executives, and providing data to analytical tools for data mining. Crucially, the warehouse also manages metadata—data about the data. Metadata includes information about data sources, transformation rules, data definitions, data lineage (where data came from and how it was changed), and access history. This metadata is essential for IT governance, audit trails, and helping business users understand the context and meaning of the data they are analyzing.
7. Query and Performance Management
The ultimate function of a data warehouse is to serve business users by providing fast and reliable access to information. It includes sophisticated mechanisms for managing user queries. The query optimizer determines the most efficient way to execute a query, often using indexes, materialized views (pre-computed results), and partitioning (dividing large tables into smaller, manageable pieces). The warehouse also manages concurrent users—ensuring that when dozens or hundreds of business analysts run reports simultaneously, the system remains responsive and stable. This performance management function is what makes the warehouse usable in practice, turning it from a theoretical repository into a practical business tool.
8. Security and Access Control
Given that a data warehouse contains the organization’s most valuable strategic data, security is a paramount function. The warehouse implements robust security and access control mechanisms. This includes authentication (verifying user identity), authorization (controlling what data each user can see), and encryption (protecting data during transmission and storage). For example, in a hospital data warehouse, a doctor might see patient medical history, but a billing clerk might only see payment-related data. In Indian financial institutions, compliance with RBI guidelines requires strict audit trails and data protection. This function ensures that sensitive business and customer data remains confidential and is accessed only by authorized personnel.
OLTP Systems
OLTP stands for Online Transaction Processing. It is a system used to manage daily business transactions quickly and accurately. OLTP systems handle large numbers of short and simple transactions such as sales, purchases, deposits and bookings. These systems are commonly used in banks, retail stores, railway reservations and online shopping websites. The main purpose of OLTP is to ensure fast processing, data accuracy and data integrity. It supports real time operations and allows multiple users to access data at the same time. In India, ATMs, UPI payments and e commerce platforms depend heavily on OLTP systems for smooth functioning.
Functions of OLTP Systems
1. High-Speed Transaction Processing
The core function of an OLTP system is to process a large volume of simple, routine transactions in real-time or near real-time. These are the day-to-day operations of a business—recording a sale, processing a bank withdrawal, booking a railway ticket, or updating inventory. Speed is critical; the system must handle thousands or even millions of transactions per second with minimal delay. When you swipe your debit card at a store in India, the OLTP system at your bank must instantly verify your PIN, check your balance, deduct the amount, and confirm the transaction—all in a matter of seconds. This high-speed processing keeps business operations running smoothly and efficiently.
2. Data Entry and Capture
OLTP systems are designed primarily for data entry and capture. They are the front-line systems where business events are first recorded. They provide user-friendly interfaces (like forms on a website, screens at a cash register, or mobile app interfaces) that allow users—whether customers, clerks, or operators—to enter data accurately and efficiently. For example, when an IRCTC ticket agent enters your journey details, the OLTP system captures that data and stores it as a transaction record. This function ensures that all business activities are digitized and preserved in the operational database, creating the raw data that will later feed into data warehouses for analysis.
3. Concurrency Control
A critical function of OLTP systems is managing concurrent access to data. This means handling situations where multiple users try to access or modify the same data at the exact same time. Without proper control, this could lead to data corruption or inconsistency. For example, if two customers try to book the last seat on a Mumbai-Delhi flight simultaneously, the OLTP system must ensure that only one gets the seat and the other is informed it is sold out. OLTP systems use mechanisms like locking and transaction isolation to prevent conflicts, ensuring data integrity even under heavy concurrent load.
4. Data Integrity and Consistency
OLTP systems enforce strict data integrity rules to ensure that the database remains accurate and reliable at all times. This is achieved through the ACID properties (Atomicity, Consistency, Isolation, Durability). For example, when you transfer money from one bank account to another using UPI, the OLTP system ensures that:
-
Atomicity: Either both the debit and credit happen, or neither happens.
-
Consistency: The total money in the system remains the same.
-
Isolation: Other transactions don’t see the transfer until it is complete.
-
Durability: Once completed, the transfer is permanently recorded, even if the system crashes immediately after.
This function guarantees that the database always reflects the true state of the business.
5. Data Validation
Before accepting and processing a transaction, OLTP systems perform rigorous data validation to ensure that the incoming data meets all business rules and constraints. This includes checking for data type correctness (e.g., ensuring a date field actually contains a date), range checks (e.g., verifying that a person’s age is not negative), mandatory field checks (e.g., ensuring a customer’s phone number is provided), and referential integrity (e.g., verifying that a product ID entered in a sales transaction actually exists in the product master table). If validation fails, the transaction is rejected with an appropriate error message. This function prevents bad data from entering the system in the first place.
6. Short and Simple Query Processing
OLTP systems are optimized for handling a large number of short, simple, and repetitive queries. These queries typically involve inserting, updating, or retrieving a small amount of data (often just a single record at a time). For example: “Show me the balance of account number 12345,” or “Insert a new order for customer ABC.” The database schema is highly normalized to minimize redundancy and speed up these simple operations. OLTP systems are not designed for complex analytical queries that scan millions of records—that would slow down the transaction processing and is the job of a data warehouse.
7. Security and Access Control
Since OLTP systems handle sensitive operational data—including customer personal information, financial details, and confidential business records—robust security is an essential function. These systems implement multiple layers of security, including:
-
Authentication: Verifying user identity through passwords, biometrics, or OTPs (common in Indian banking apps).
-
Authorization: Ensuring users can only perform actions they are permitted to (e.g., a clerk can enter sales but cannot modify salary data).
-
Audit Trails: Logging all transactions to track who did what and when.
This function protects the business from fraud, data breaches, and unauthorized access.
8. Availability and Reliability
OLTP systems must be highly available and reliable because they support mission-critical business operations. An airline cannot afford its booking system to go down; a stock exchange cannot tolerate its trading platform crashing; an e-commerce site during Diwali sales cannot be unavailable. These systems are designed with redundancy, failover mechanisms, and disaster recovery capabilities to ensure 24/7 operation. They typically aim for “five nines” availability—99.999% uptime—which translates to less than 5 minutes of downtime per year. This function ensures that business operations continue uninterrupted, maintaining customer trust and revenue generation.
Key differences between Data Warehousing and OLTP Systems
| Basis of Comparison | Data Warehousing | OLTP Systems |
|---|---|---|
| Purpose | Analysis | Transactions |
| Data Type | Historical | Current |
| Usage | Decision Making | Daily Operations |
| Users | Managers | Clerks |
| Query Type | Complex | Simple |
| Data Volume | Large | Moderate |
| Update Frequency | Periodic | Continuous |
| Processing | Analytical | Operational |
| Design | Subject Oriented | Application Oriented |
| Structure | Denormalized | Normalized |
| Performance Focus | Query Speed | Transaction Speed |
| Time Orientation | Long Term | Short Term |
| Data Source | Multiple Sources | Single Application |
| Integrity Focus | Reporting Accuracy | Data Consistency |
| Example | BI Reports | ATM System |