Business Intelligence and Machine Learning Exam Notes

Course Overview

  • Course Name: Business Intelligence (BI)
  • Instructor: Dr. Christos Bormpotsis
  • Key Topics:
    • Data-Driven Decision Making
    • Machine Learning

Key Concepts of Business Intelligence

  • Definition: BI encompasses processes, technologies, and tools for analyzing and presenting data. Its purpose is to support data-driven decision-making and derive insights from market trends.
  • Activities: Includes data visualization and data mining to analyze historical data. BI serves as a foundation for predictive techniques like Machine Learning.

Machine Learning Basics

  • Definition: A subset of Artificial Intelligence that enables machines to learn from data and make decisions without explicit programming.
  • Types:
    • Supervised Learning: Utilizes labeled data (input-output pairs) to train models.
    • Goal: Learn a mapping from inputs to outputs based on training data.
  • Key Tasks:
    • Classification: Predicts distinct classes (e.g., client A or B).
    • Regression: Forecasts continuous variables (e.g., predicting house prices).

Details of Supervised Learning

  • Classification Problem: Focuses on predicting categories (e.g., good vs bad clients).
  • Regression Problem: Aims to predict continuous outcomes (e.g., sales revenue based on advertising).
  • Graphs: Use classification and regression graphs to represent outcomes.

BI + ML Workflow

  1. Data Collection: Gather data from various sources.
  2. Data Preprocessing: Clean and format data for analysis.
  3. Feature Engineering: Create useful features based on historical data.
  4. Model Training: Train models using labeled data.
  5. Model Evaluation: Assess performance using accuracy metrics.
  6. Deployment: Integrate model predictions into BI reports for decision-making.

Data-Driven Decision-Making Process

  • Key Phases:
    • Problem/Opportunity Identification
    • Collection of Relevant Data
    • Data Preprocessing
    • Model Building
    • Communication and Deployment of BI Systems

Examples of Classification and Regression

  • Classification Example:
    • Variables: Current debts, Income, Marital status, etc.
    • Outcome: Characterization as Good or Bad.
  • Regression Example:
    • Scatter plot used to identify the correlation between advertisement and sales.

Installation and Setup of Tools

  • Anaconda Installation:
    • For Windows: Download, install, select "Just Me", and set the destination folder.
    • For macOS: Download graphical installer, follow prompts for installation.

Data Types and Their Importance

  • Data Types:
    • Integers: Whole numbers
    • Float: Decimal numbers
    • Boolean: True/False values
    • Strings: Textual data
  • Types of Data:
    • Continuous Data: Can take any value (e.g., height, weight)
    • Nominal Data: Categories without order (e.g., gender)
    • Ordinal Data: Categories with order (e.g., education levels)
    • Discrete Data: Specific integer values (e.g., counting people)

Key Python Libraries for Data Analysis

  • Pandas: Open-source library for data manipulation; provides DataFrame and Series for structured data.
  • NumPy: Supports large arrays and matrices; foundational for numerical computing.
  • Scikit-learn: Contains tools for data mining and analysis including classification and regression methods.

Important Methods and Attributes in Pandas

  • DataFrame Methods:
    • DataFrame.info(): Summarizes the DataFrame’s structure.
    • DataFrame.shape: Returns a tuple representing dimensions (rows, columns).
    • DataFrame.isnull(): Identifies null values.
    • DataFrame.notnull(): Identifies non-null values.

Conclusion

  • This course emphasizes the integration of BI and ML for effective data-driven decision-making, focusing on the process from data collection to model deployment.