DATA WAREHOUSING AND DATA MINING (R18A0524)
Digital Notes on Data Warehousing and Data Mining
Page 1
Malla Reddy College of Engineering & Technology
Department of Information Technology
B.Tech III Year - II Sem (2020-21)
Autonomous Institution – UGC, Govt. of India
Contact Number: 040-23792146/64634237
E-Mail ID: mrcet2004@gmail.com
Website: www.mrcet.ac.in
Page 2
Course Syllabus Overview
Course Code: R18A0524
Credits: 3
Objectives:
Study data warehouse principles and its working.
Learn Data mining concepts and understand Association Rule Mining.
Study Classification Algorithms.
Gain knowledge of clustering techniques in data grouping.
Unit Structure
Data Warehouse: Introduction, Architecture, ETL, OLAP, Modeling.
Data Mining Fundamentals: Functionalities, Data Preprocessing, Major Issues.
Association Rules: Problem definition, Algorithms like Apriori.
Classification: Concepts, Decision Trees, Naive Bayes.
Clustering: Overview, Methods, K-Means, Hierarchical Clustering.
Page 3
Data Warehouse - Unit I Key Concepts
Data Warehouse: Subject-oriented, integrated, time-variant, non-volatile collection of data.
Characteristics:
Subject-oriented: Data focused on specific areas (e.g., sales).
Integrated: Combines data from different sources.
Time-variant: Historical data maintained.
Non-volatile: No changes to data once stored.
Design Approaches: Top-down, Bottom-up, or Combined.
Page 4
Data Mining Fundamentals - Unit II Overview
Data Mining: Extracting patterns from large datasets.
Functionalities: Descriptive vs. Predictive tasks, association rule learning, clustering.
Data Preprocessing Steps:
Cleaning, Integration, Transformation, Reduction, Discretization.
Need for Data Preprocessing: Improves quality and usefulness of data.
Page 5
Association Rule Mining - Unit III Key Concepts
Association Rules: Identify interesting relations between variables.
Key Elements:
Support: Proportion of transactions containing a particular itemset.
Confidence: Likelihood of occurrence of the RHS given the LHS.
Lift: Ratio of observed support to expected support.
Methods: Apriori Algorithm, FP-Growth for mining frequent itemsets.
Page 6
Classification - Unit IV Concepts
Classification Objective: Assigning categorical labels.
Techniques:
Decision Trees: Intuitive structure using attribute tests.
Naive-Bayes Classifier: Based on Bayes theorem assumptions.
K-Nearest Neighbor: Classifies based on proximity to training data.
Evaluation: Accuracy; Confusion matrix and ROC curves.
Page 7
Clustering - Unit V Overview
Clustering Purpose: Grouping similar data objects.
Types of Methods:
Partitioning: e.g., K-Means, K-Medoids.
Hierarchical: Agglomerative and Divisive clustering.
Density-based: Identify clusters in populated regions.
Key Challenges: High dimensions, noise, and scalability.
Page 8
Textbooks and References
Primary Textbooks:
"Data Mining: Concepts and Techniques" by Jiawei Han et al.
"Introduction to Data Mining" by Pang-Ning Tan et al.
Reference Books:
"Data Mining Techniques" by Arun K. Pujari.
"Data Warehousing Fundamentals" by Pualraj Ponnaiah.
"The Data Warehouse Lifecycle Toolkit" by Ralph Kimball.
"Data Mining" by Vikaram Pudi and P Reddy Krishna.
Page 9
Outcomes of the Course
Comparison between data warehouses and databases.
Ability to preprocess data and apply mining techniques.
Identify associations, classifications, and clusters in datasets.
Apply data mining skills to real-world problems.
Page 10
Index Overview
Unit I: Introduction and Design of Data Warehouses.
Unit II: Data Mining Fundamentals and Preparation.
Unit III: Association Rule Mining and Algorithm details.
Unit IV: Classification methods and Algorithms.
Unit V: Clustering methods and applications.