Types of Warehousing Applications
Virtual Data Warehouses
Virtual Data Warehouses is created in the following stages:
- Installing a set of data approach, data dictionary, and process management facilities.
- Training end-clients.
- Monitoring how DW facilities will be used.
- Based upon actual usage, physically Data Warehouse is created to provide the high-frequency results.
This strategy defines that end users are allowed to get at operational databases directly using whatever tools are implemented to the data access network. This method provides ultimate flexibility as well as the minimum amount of redundant information that must be loaded and maintained. The data warehouse is a great idea, but it is difficult to build and requires investment. Why not use a cheap and fast method by eliminating the transformation phase of repositories for metadata and another database. This method is termed the ‘Virtual Data Warehouse.’
Distributed Data Warehouses
The concept of a distributed data warehouse suggests that there are two types of distributed data warehouses and their modifications for the local enterprise warehouses which are distributed throughout the enterprise and a global warehouse.
Characteristics of Local data warehouses
- Activity appears at the local level
- Bulk of the operational processing
- Local site is autonomous
- Each local data warehouse has its unique architecture and contents of data
- The data is unique and of prime essential to that locality only
- Majority of the record is local and not replicated
- Any intersection of data between local data warehouses is circumstantial
- Local warehouse serves different technical communities
- The scope of the local data warehouses is finite to the local site
- Local warehouses also include historical data and are integrated only within the local site.
Stationary Data Warehouses
the customer is given direct access to the data. For many organizations, infrequent access, volume issues, or corporate necessities dictate such as approach. This schema does generate several problems for the customer such as
- Identifying the location of the information for the users.
- Providing clients the ability to query different DBMSs as is they were all a single DBMS with a single API.
- Impacting performance since the customer will be competing with the production data stores.
Multi-Stage Data Warehouses
It refers to multiple stages in transforming methods for analyzing data through aggregations. In other words, staging of the data multiple times before the loading operation into the data warehouse, data gets extracted form source systems to staging area first, then gets loaded to data warehouse after the change and then finally to departmentalized data marts.
This configuration is well suitable to environments where end-clients in numerous capacities require access to both summarized information for up to the minute tactical decisions as well as summarized, a commutative record for long-term strategic decisions. Both the Operational Data Store (ODS) and the data warehouse may reside on host-based or LAN Based databases, depending on volume and custom requirements. These contain DB2, Oracle, Informix, IMS, Flat Files, and Sybase.
Host-Based Single Stage (LAN) Data Warehouses
Within a LAN based data warehouse, data delivery can be handled either centrally or from the workgroup environment so business groups can meet process their data needed without burdening centralized IT resources, enjoying the autonomy of their data mart without comprising overall data integrity and security in the enterprise.
LAN-Based Workgroup Data Warehouses
A LAN based workgroup warehouse is an integrated structure for building and maintaining a data warehouse in a LAN environment. In this warehouse, we can extract information from a variety of sources and support multiple LAN based warehouses, generally chosen warehouse databases to include DB2 family, Oracle, Sybase, and Informix. Other databases that can also be contained through infrequently are IMS, VSAM, Flat File, MVS, and VH.
Host-Based (UNIX) Data Warehouses
Oracle and Informix RDBMSs support the facilities for such data warehouses. Both of these databases can extract information from MVS based databases as well as a higher number of other UNIX¬ based databases. These types of warehouses follow the same stage as the host-based MVS data warehouses. Also, the data from different network servers can be created. Since file attribute consistency is frequent across the inter-network.
Host-Based (MVS) Data Warehouses
Those data warehouse uses that reside on large volume databases on MVS are the host-based types of data warehouses. Often the DBMS is DB2 with a huge variety of original source for legacy information, including VSAM, DB2, flat files, and Information Management System (IMS).
To make such data warehouses building successful, the following phases are generally followed:
- Unload Phase: It contains selecting and scrubbing the operation data.
- Transform Phase: For translating it into an appropriate form and describing the rules for accessing and storing it.
- Load Phase: For moving the record directly into DB2 tables or a particular file for moving it into another database or non-MVS warehouse.
Host-Based Data Warehouses
There are two types of host-based data warehouses which can be implemented:
Host-Based mainframe warehouses which reside on a high-volume database. Supported by robust and reliable high-capacity structure such as IBM system/390, UNISYS and Data General sequent systems, and databases such as Sybase, Oracle, Informix, and DB2.
Host-Based LAN data warehouses, where data delivery can be handled either centrally or from the workgroup environment. The size of the data warehouses of the database depends on the platform.
Web Mining
Web mining is the application of data mining techniques to discover patterns from the World Wide Web. It uses automated methods to extract both structured and unstructured data from web pages, server logs and link structures. There are three main sub-categories of web mining. Web content mining extracts information from within a page. Web structure mining discovers the structure of the hyperlinks between documents, categorizing sets of web pages and measuring the similarity and relationship between different sites. Web usage mining finds patterns of usage of web pages.
Web usage mining is the application of data mining techniques to discover interesting usage patterns from Web data in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site.
Web usage mining essentially has many advantages which makes this technology attractive to corporations including government agencies. This technology has enabled e-commerce to do personalized marketing, which eventually results in higher trade volumes. Government agencies are using this technology to classify threats and fight against terrorism. The predicting capability of mining applications can benefit society by identifying criminal activities. Companies can establish better customer relationship by understanding the needs of the customer better and reacting to customer needs faster. Companies can find, attract and retain customers; they can save on production costs by utilizing the acquired insight of customer requirements. They can increase profitability by target pricing based on the profiles created. They can even find customers who might default to a competitor the company will try to retain the customer by providing promotional offers to the specific customer, thus reducing the risk of losing a customer or customers.
More benefits of web usage mining, particularly in the area of personalization, are outlined in specific frameworks such as the probabilistic latent semantic analysis model, which offer additional features to the user behavior and access pattern. This is because the process provides the user with more relevant content through collaborative recommendation. These models also demonstrate a capability in web usage mining technology to address problems associated with traditional techniques such as biases and questions regarding validity since the data and patterns obtained are not subjective and do not degrade over time. There are also elements unique to web usage mining that can show the technology’s benefits and these include the way semantic knowledge is applied when interpreting, analyzing, and reasoning about usage patterns during the mining phase.
Web usage mining itself can be classified further depending on the kind of usage data considered:
- Web server data: The user logs are collected by the Web server. Typical data includes IP address, page reference and access time.
- Application server data: Commercial application servers have significant features to enable e-commerce applications to be built on top of them with little effort. A key feature is the ability to track various kinds of business events and log them in application server logs.
- Application level data: New kinds of events can be defined in an application, and logging can be turned on for them thus generating histories of these specially defined events. Many end applications require a combination of one or more of the techniques applied in the categories above.