1/33
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Relational
Stores data in tables with relationships between them.
Object-oriented
Stores data as objects, similar to object-oriented programming languages.
Network
Based on a traditional hierarchical database, Data is represented using a graph, with records and links. In Networsk Child Records can have multiple parent records
Spatial
A Type of Databse Optimized for storing and querying data representing objects defined in a geometric space.
Multi-dimensional
Data is organized into dimensions, often used in data warehousing and OLAP (Online Analytical Processing)
Data Definition
The process of specifying the structure of a database — including the tables, fields (columns), data types, and relationships.
Data Manipulation
The process of adding, changing, retrieving, or removing data in a database.
Data Integrity
The accuracy, consistency, and reliability of data in the database.
Data Warehouse
A subject-oriented, integrated, time-variant, and non-volatile collection of data is used in decision-making.
A large Centralized repository for storing integrated data from multiple sources.
Subject-oriented
Organized around major subjects like sales, product.
Integrated
Data is collected from various sources and merged into a coherent whole.
Time-variant
Data is kept for historical analysis and has a time dimension.
Non-volatile
Once entered into the warehouse, data is not updated.
Strategic Planning
Using data to make long-term decisions.
Business Modelling
Creating data models to simulate different business scenarios.
Time Dependency
Data in a warehouse is valid for a specific range of time.
Real-time Updates
Data is constantly refreshed from operational systems.
Advantages of Data Warhouse
Centralizes data management
Supports decision-making with complex queries.
Enables historical analysis and trend identification over time
Allows for faster query performance due to optimized structure (e.g., de-normalized schema)
Doesnt Effect Daily Operations
Extract
Pulling data from various sources.
Such as:
Operational Databases
CRM (Customer Relationship Management) System
ERP (Enterprise Resource Planning) system
Spreadsheets & CSV files
Transform
Extracted data is cleaned, restructured, and converted to match the format and rules of the target data warehouse.
Deduplicate
Clean (Remove Invalid or Missing Data)
Standardize (Make sure all data follows same rules)
Loading
Inserting data into the final target database or data warehouse.
Data Cleaning
Enhancing data quality by rectifying inconsistent, incomplete, or inaccurate data.
Pattern Discovery
Identifying patterns or correlations in large datasets using methods like cluster analysis, associations, classifications, sequential patterns, forecasting.
Benefits of Pattern Analysis
Fraud detection in banking
optimizing retail marketing strategies.
Disease Detection in Healthcare
Predictive Modelling
Using statistical techniques for prediction like decision tree induction and neural networks.
Database Segmentation
Dividing a database into distinct segments based on certain criteria.
Link Analysis
Analyzing connections between nodes in a network to identify relationships and patterns.
Deviation Detection
Identifying unexpected or rare items, events, or observations that raise suspicions by differing significantly.
Decision tree induction
A predictive modelling technique where a system learns from historical data by building a tree-like structure of decisions and outcomes.
Cluster Analysis
A method of grouping similar data points together based on shared characteristics, without pre-labeled categories.
Uses Unsupervised learning
Example: Segmenting customers into groups based on shopping habit
Associations
Rule Learning, an (If, Then)
Identifies relationships between variables by finding items that frequently occur together in a dataset.
Often used in market basket analysis
Example: “If a customer buys bread and butter, they are likely to buy jam”
Classifications
Assigns data into predefined categories or classes based on learned patterns from labeled training data.
Uses Supervised learning
Example: Predicting if an email is spam or not based on its contents
Sequential patterns
Identifies frequently occurring ordered sequences of events or items in a dataset.
Focuses on time-based patterns
Example: A user who buys a phone is likely to buy a case a week later, then a charger
forecasting
Predicts future values based on patterns found in historical data, using techniques like regression or time series analysis.
Example: Stock Market Prices