Business Intelligence and Machine Learning Exam Notes
Course Overview
- Course Name: Business Intelligence (BI)
- Instructor: Dr. Christos Bormpotsis
- Key Topics:
- Data-Driven Decision Making
- Machine Learning
Key Concepts of Business Intelligence
- Definition: BI encompasses processes, technologies, and tools for analyzing and presenting data. Its purpose is to support data-driven decision-making and derive insights from market trends.
- Activities: Includes data visualization and data mining to analyze historical data. BI serves as a foundation for predictive techniques like Machine Learning.
Machine Learning Basics
- Definition: A subset of Artificial Intelligence that enables machines to learn from data and make decisions without explicit programming.
- Types:
- Supervised Learning: Utilizes labeled data (input-output pairs) to train models.
- Goal: Learn a mapping from inputs to outputs based on training data.
- Key Tasks:
- Classification: Predicts distinct classes (e.g., client A or B).
- Regression: Forecasts continuous variables (e.g., predicting house prices).
Details of Supervised Learning
- Classification Problem: Focuses on predicting categories (e.g., good vs bad clients).
- Regression Problem: Aims to predict continuous outcomes (e.g., sales revenue based on advertising).
- Graphs: Use classification and regression graphs to represent outcomes.
BI + ML Workflow
- Data Collection: Gather data from various sources.
- Data Preprocessing: Clean and format data for analysis.
- Feature Engineering: Create useful features based on historical data.
- Model Training: Train models using labeled data.
- Model Evaluation: Assess performance using accuracy metrics.
- Deployment: Integrate model predictions into BI reports for decision-making.
Data-Driven Decision-Making Process
- Key Phases:
- Problem/Opportunity Identification
- Collection of Relevant Data
- Data Preprocessing
- Model Building
- Communication and Deployment of BI Systems
Examples of Classification and Regression
- Classification Example:
- Variables: Current debts, Income, Marital status, etc.
- Outcome: Characterization as Good or Bad.
- Regression Example:
- Scatter plot used to identify the correlation between advertisement and sales.
- Anaconda Installation:
- For Windows: Download, install, select "Just Me", and set the destination folder.
- For macOS: Download graphical installer, follow prompts for installation.
Data Types and Their Importance
- Data Types:
- Integers: Whole numbers
- Float: Decimal numbers
- Boolean: True/False values
- Strings: Textual data
- Types of Data:
- Continuous Data: Can take any value (e.g., height, weight)
- Nominal Data: Categories without order (e.g., gender)
- Ordinal Data: Categories with order (e.g., education levels)
- Discrete Data: Specific integer values (e.g., counting people)
Key Python Libraries for Data Analysis
- Pandas: Open-source library for data manipulation; provides DataFrame and Series for structured data.
- NumPy: Supports large arrays and matrices; foundational for numerical computing.
- Scikit-learn: Contains tools for data mining and analysis including classification and regression methods.
Important Methods and Attributes in Pandas
- DataFrame Methods:
DataFrame.info(): Summarizes the DataFrame’s structure.DataFrame.shape: Returns a tuple representing dimensions (rows, columns).DataFrame.isnull(): Identifies null values.DataFrame.notnull(): Identifies non-null values.
Conclusion
- This course emphasizes the integration of BI and ML for effective data-driven decision-making, focusing on the process from data collection to model deployment.