Phases of Knowledge Discovery in DataBases (KDD)
Some people don’t differentiate data mining from knowledge discovery while others view data mining as an essential step in the process of knowledge discovery. Here is the list of steps involved in the knowledge discovery process −
- Data Cleaning− In this step, the noise and inconsistent data is removed.
- Data Integration− In this step, multiple data sources are combined.
- Data Selection− In this step, data relevant to the analysis task are retrieved from the database.
- Data Transformation− In this step, data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.
- Data Mining− In this step, intelligent methods are applied in order to extract data patterns.
- Pattern Evaluation− In this step, data patterns are evaluated.
- Knowledge Presentation− In this step, knowledge is represented.
The following diagram shows the process of knowledge discovery −
Steps involved in the entire KDD process are:
- Identify the goal of the KDD process from the customer’s perspective.
- Understand application domains involved and the knowledge that’s required
- Select a target data set or subset of data samples on which discovery is be performed.
- Cleanse and preprocess data by deciding strategies to handle missing fields and alter the data as per the requirements.
- Simplify the data sets by removing unwanted variables. Then, analyze useful features that can be used to represent the data, depending on the goal or task.
- Match KDD goals with data mining methods to suggest hidden patterns.
- Choose data mining algorithms to discover hidden patterns. This process includes deciding which models and parameters might be appropriate for the overall KDD process.
- Search for patterns of interest in a particular representational form, which include classification rules or trees, regression and clustering.
- Interpret essential knowledge from the mined patterns.
- Use the knowledge and incorporate it into another system for further action.
- Document it and make reports for interested parties.