Data Analysis
Data Analysis
Learning Objective
Explain how data can be analysed to provide business intelligence.
Introduction
Reflection on previous topic: qualities of good information and generation of business intelligence from data.
Focus on:
Data collection and preparation for analysis.
Development of AI analytics tools and their impact.
Role of financial professionals in processing data and collaboration with ICT professionals.
The Data Analysis Process
The specific processes for analyzing data vary among businesses and shift with technological advancements.
Basic stages (recognition of challenges and need for continuous knowledge updates):
Selection of the data
Pre-processing of the data for quality improvement
Transformation of the dataset for analysis readiness
Data mining for pattern and relationship recognition
Evaluation of findings to derive insights
Knowledge Discovery in Databases (KDD)
Definition: Process of analyzing data to generate knowledge.
Origin: Late 1980s prior to big data and AI algorithm utilization.
Five basic stages remain unchanged despite technological developments:
Selection of the data
Pre-processing of the dataset for quality improvement
Transformation of the dataset for analysis readiness
Data mining for pattern recognition
Evaluation of findings for insights
Preparing the Data
Selection
Big data along with AI allows businesses to answer almost any relevant question, with the caveat of costs.
Data selection contingent upon specific questions that reflect strategic plans, goals, and objectives.
Goals identification and question formulation is discussed further in this competency area.
Pre-processing
Purpose
Improve data quality prior to analysis through cleaning.
Specific Problems Addressed
Noise: Corrupted or unwanted data, includes:
Faulty data (e.g., erroneous product codes)
Irrelevant data (e.g., customer addresses for age analysis)
Meaningless conversions (e.g., numbers reformatted as dates)
Note: Leaving some noise may prevent overfitting in algorithms.
Outliers: Significant deviations in data that may indicate errors or natural variations. Evaluation before removal is crucial to avoid losing valid data.
Duplicates: Occurs when data is recorded multiple times or in different formats. Ensures a single version is retained by identifying instances and correcting discrepancies.
Omissions: Incomplete datasets through:
Data regeneration from other sources.
Re-attempting data collection.
Adjusting algorithms to accommodate omissions.
Cleaning Data
Tasks considered specialized, often performed by algorithms, but finance professionals are involved in:
Correcting data errors.
Reviewing suggested outliers.
Regenerating missing data.
Transformation
Prepares data for analysis through size reduction and format conversion for analytics.
Techniques for Dataset Size Reduction
Sampling:
Selecting a representative sample versus full dataset.
Commonly used in:
Quality control (checking products)
Drug testing (conducting trials)
Auditing (inspecting item samples)
Statistical models help determine necessary sample sizes.
Example: Monitoring temperature in factory settings based on production complexity.
Aggregation:
Combining features based on analysis purpose.
Example:
Exam data aggregated by school for institution performance (e.g., summing results).
Critique on generic aggregation in drug trials resulting in follow-on decisions detrimental to specific groups.
Analysts need guidance from business on aggregation strategy to avoid misinterpretations.
The Order of Stages
Data analysis processes are in constant flux, determined largely by storage methods:
ETL: Extraction, Transformation, Loading into data warehouses.
ELT: Extract, Load, Transform for data lakes.
The Power of Artificial Intelligence (AI)
Definition of AI
Describes computer use for tasks traditionally performed by human cognition. This includes processing large data volumes rapidly.
Automated decision-making raises issues; however, algorithms function within human-defined parameters.
Key Capabilities Transforming Data Analytics
Object Recognition:
AI classifies images based on visual clues (e.g., photos, videos).
Natural Language Processing (NLP):
AI analyzes and interprets language, adapting to various accents and contexts.
Human-AI Interaction:
Used in customer service through chatbots and emotion detection in calls.
Machine Learning:
Algorithms develop independently from instructions, improving over time based on vast data sets.
Types of Machine Learning
Supervised Learning
Data is tagged for classification into categories.
Examples:
Classification Types:
Binary (e.g., spam vs. legitimate emails)
Multi-class (e.g., customer categories)
Multi-label (e.g., books with multiple genres)
Steps involved in the supervised learning process:
Labeling of data.
Training the machine with subsets of data to identify features.
Learning with feedback for accuracy.
Application of rules to classify new data.
Metrics: Precision and Recall concepts applied to assess outcome effectiveness.
Regression Analysis:
Used for predictive analysis considering numerous variables.
Examples to apply in business:
Advertising strategies considering demographic traffic, weather patterns, among others.
Challenges: Overfitting and Underfitting:
Overfitting: Creating perfect rules for dataset specifics, lacking generalization.
Underfitting: Inadequate training results in inability to discern relationships.
Input from business professionals can help resolve these issues.
Unsupervised Learning
Definition
Machine analyzes unlabelled datasets to discover patterns without specific prompts.
Techniques
Cluster Analysis:
Groups data based on shared characteristics (e.g., customer queries).
Anomaly Detection:
Identifies outliers which may indicate fraud or other concerns.
Association Rules Mining:
Discovers relationships using logical inference for predictive insights (e.g., market basket analysis, health symptom identification).
Reinforcement Learning
Mimics behavioral training through positive reinforcement for achieving set objectives.
Algorithms adjust based on performance outcomes, enhancing their functionality through repetitive processes.
Setting Parameters
Importance of defining operational bounds for AI systems to prevent unintended consequences.
Businesses need to anticipate potential paths AI might take to ensure outcomes align with ethical and operational standards.
Key Roles in Data Analytics
Data Scientists:
Specialize in data mining, modeling processes, and developing predictive algorithms.
Data Analysts:
Focus on answering business questions through detailed data evaluation and interpretations.
Business Analysts:
Bridge between business needs and technical data solutions to ensure project alignment.
Finance Professionals:
Integral in ensuring data accuracy, ethical considerations, and effective communication of findings.
Conclusion
Overview of focusing on data collection, preparation, analytical methodology, and AI's role in data-driven decision-making.
Transition to discuss data's influence on business decision making in upcoming sections of the course.