07 - Analytics
Introduction to Analytics and Data Mining
Teacher: Ivika Jäger, Mittuniversitetet, November 2024
Types of Analytics
Descriptive Analytics: Analyzes historical data to understand patterns and trends.
Predictive Analytics: Uses historical data and statistical algorithms to anticipate future outcomes.
Prescriptive Analytics: Provides recommendations for actions based on data analysis.
Predictive Analytics
Purpose: To predict future trends and behaviors.
Methods Used:
Regression analysis
Time series analysis
Machine learning algorithms
Classification models
Data mining
Examples of Analytics in Practice
Retail Company: Uses descriptive analytics to evaluate past sales data and customer behavior to inform new product strategies.
E-commerce Company: Applies predictive analytics to forecast product demand for holiday seasons based on historical sales data.
Logistics Company: Utilizes prescriptive analytics to determine optimal delivery routes by analyzing historical and real-time data.
Data Mining Overview
Definition: The process of extracting knowledge from large datasets.
Key Techniques:
Prediction: Forecasting future events, for example, sales forecasting.
Classification: Assigning data to predefined categories.
Clustering: Grouping data points without predefined labels.
Association: Identifying relationships between variables, e.g., co-purchase behavior.
Data Mining vs. Statistics
Statistics: Starts with a hypothesis, tests with sample data (e.g., observing ice cream sales based on weather).
Data Mining: Explores all data to find patterns (e.g., identifying higher sales on weekends).
Common Myths about Data Mining
Myth: Data mining provides immediate clear predictions.
Reality: Requires domain knowledge and time.
Myth: An advanced degree is necessary.
Reality: There are accessible tools available for anyone.
Myth: Only large firms can utilize data mining.
Data Mining Process
Business Understanding: Define objectives and problems.
Data Understanding: Evaluate data quality.
Data Preparation: Clean and preprocess data.
Model Building: Utilize algorithms to find patterns.
Testing and Evaluation: Check model performance.
Deployment: Implement insights within business dynamics.
Text Mining
Purpose: Converts unstructured text into structured data.
Challenges:
Correctly tagging words (nouns vs. verbs).
Language ambiguity.
Solution: Modern AI tools like LLMs enhance contextual understanding.
Web Mining
Purpose: Analyzes web content, structure, and usage.
Components:
Web Content: Extracting information from web pages.
Web Structure: Understanding website links.
Web Usage: User behavior analysis.
SEO: Strategies to increase website visibility through keywords, tags, backlinks.
Deep Learning vs. Machine Learning
Machine Learning (ML): Requires feature definitions.
Deep Learning (DL): Automatically learns features, reducing manual work.
Comparison: DL handles complex data representations more efficiently than traditional ML.
Features in Deep Learning
Definition: Features (columns in datasets) are explanatory variables.
Role of Neurons: Process features, not observations.
Challenges in Deep Learning
Requires advanced hardware (e.g., GPUs).
Needs large, high-quality datasets.
Manual data labeling is time and cost-intensive.
Overcoming these challenges enhances predictive capabilities.
Deep Neural Networks
Evolution: Early networks had few layers; modern networks can have millions of neurons.
Input Data: Processes multidimensional inputs (e.g., image pixels).
Types of Neural Networks
Multilayer Perceptron (MLP): Simple feedforward network for basic tasks.
Recurrent Neural Network (RNN): Retains feedback and memory for contextual learning.
Long Short-Term Memory (LSTM): Type of RNN for memory efficiency.
Iterative Optimization in Deep Learning
Models fine-tune weights iteratively to reduce prediction errors.
Requires repeated evaluations of input-output relations to improve model performance.
AI as a Service (AIaaS)
Offers pre-configured AI solutions via cloud providers.
Streamlines routine tasks, allowing businesses to focus on innovation.
ChatGPT Data Analysis Capabilities
Ability to upload and analyze data files (e.g., CSV).
Can interpret images, take real-time screenshots, and engage in interactive tasks.
Practical Analysis of Coffee Sales Data
Dataset Overview: Comprises 2,341 entries; includes date, time, payment type, amount, and coffee type.
Analysis Focus: Identifying trends in coffee purchases, payment methods, and customer preferences.
Conclusion
The notes summarize various aspects of analytics, data mining, and the applications of AI in business decision-making and enhancing data interpretation.