Business Intelligence and Machine Learning Exam Notes

Definition: BI encompasses processes, technologies, and tools for analyzing and presenting data. Its purpose is to support data-driven decision-making and derive insights from market trends.
Activities: Includes data visualization and data mining to analyze historical data. BI serves as a foundation for predictive techniques like Machine Learning.

Definition: A subset of Artificial Intelligence that enables machines to learn from data and make decisions without explicit programming.
Types:
- Supervised Learning: Utilizes labeled data (input-output pairs) to train models.
- Goal: Learn a mapping from inputs to outputs based on training data.
Key Tasks:
- Classification: Predicts distinct classes (e.g., client A or B).
- Regression: Forecasts continuous variables (e.g., predicting house prices).

Classification Problem: Focuses on predicting categories (e.g., good vs bad clients).
Regression Problem: Aims to predict continuous outcomes (e.g., sales revenue based on advertising).
Graphs: Use classification and regression graphs to represent outcomes.

Data Collection: Gather data from various sources.
Data Preprocessing: Clean and format data for analysis.
Feature Engineering: Create useful features based on historical data.
Model Training: Train models using labeled data.
Model Evaluation: Assess performance using accuracy metrics.
Deployment: Integrate model predictions into BI reports for decision-making.

Key Phases:
- Problem/Opportunity Identification
- Collection of Relevant Data
- Data Preprocessing
- Model Building
- Communication and Deployment of BI Systems

Classification Example:
- Variables: Current debts, Income, Marital status, etc.
- Outcome: Characterization as Good or Bad.
Regression Example:
- Scatter plot used to identify the correlation between advertisement and sales.

Anaconda Installation:
- For Windows: Download, install, select "Just Me", and set the destination folder.
- For macOS: Download graphical installer, follow prompts for installation.

Data Types:
- Integers: Whole numbers
- Float: Decimal numbers
- Boolean: True/False values
- Strings: Textual data
Types of Data:
- Continuous Data: Can take any value (e.g., height, weight)
- Nominal Data: Categories without order (e.g., gender)
- Ordinal Data: Categories with order (e.g., education levels)
- Discrete Data: Specific integer values (e.g., counting people)

Pandas: Open-source library for data manipulation; provides DataFrame and Series for structured data.
NumPy: Supports large arrays and matrices; foundational for numerical computing.
Scikit-learn: Contains tools for data mining and analysis including classification and regression methods.

DataFrame Methods:
- DataFrame.info(): Summarizes the DataFrame’s structure.
- DataFrame.shape: Returns a tuple representing dimensions (rows, columns).
- DataFrame.isnull(): Identifies null values.
- DataFrame.notnull(): Identifies non-null values.

This course emphasizes the integration of BI and ML for effective data-driven decision-making, focusing on the process from data collection to model deployment.