Data Analysis

Data Analysis

Learning Objective

  • Explain how data can be analysed to provide business intelligence.

Introduction

  • Reflection on previous topic: qualities of good information and generation of business intelligence from data.

  • Focus on:

    • Data collection and preparation for analysis.

    • Development of AI analytics tools and their impact.

    • Role of financial professionals in processing data and collaboration with ICT professionals.

The Data Analysis Process

  • The specific processes for analyzing data vary among businesses and shift with technological advancements.

  • Basic stages (recognition of challenges and need for continuous knowledge updates):

    • Selection of the data

    • Pre-processing of the data for quality improvement

    • Transformation of the dataset for analysis readiness

    • Data mining for pattern and relationship recognition

    • Evaluation of findings to derive insights

Knowledge Discovery in Databases (KDD)

  • Definition: Process of analyzing data to generate knowledge.

  • Origin: Late 1980s prior to big data and AI algorithm utilization.

  • Five basic stages remain unchanged despite technological developments:

    • Selection of the data

    • Pre-processing of the dataset for quality improvement

    • Transformation of the dataset for analysis readiness

    • Data mining for pattern recognition

    • Evaluation of findings for insights

Preparing the Data

Selection
  • Big data along with AI allows businesses to answer almost any relevant question, with the caveat of costs.

  • Data selection contingent upon specific questions that reflect strategic plans, goals, and objectives.

  • Goals identification and question formulation is discussed further in this competency area.

Pre-processing
Purpose
  • Improve data quality prior to analysis through cleaning.

Specific Problems Addressed
  • Noise: Corrupted or unwanted data, includes:

    • Faulty data (e.g., erroneous product codes)

    • Irrelevant data (e.g., customer addresses for age analysis)

    • Meaningless conversions (e.g., numbers reformatted as dates)

    • Note: Leaving some noise may prevent overfitting in algorithms.

  • Outliers: Significant deviations in data that may indicate errors or natural variations. Evaluation before removal is crucial to avoid losing valid data.

  • Duplicates: Occurs when data is recorded multiple times or in different formats. Ensures a single version is retained by identifying instances and correcting discrepancies.

  • Omissions: Incomplete datasets through:

    • Data regeneration from other sources.

    • Re-attempting data collection.

    • Adjusting algorithms to accommodate omissions.

Cleaning Data
  • Tasks considered specialized, often performed by algorithms, but finance professionals are involved in:

    • Correcting data errors.

    • Reviewing suggested outliers.

    • Regenerating missing data.

Transformation
  • Prepares data for analysis through size reduction and format conversion for analytics.

Techniques for Dataset Size Reduction
  1. Sampling:

    • Selecting a representative sample versus full dataset.

    • Commonly used in:

      • Quality control (checking products)

      • Drug testing (conducting trials)

      • Auditing (inspecting item samples)

    • Statistical models help determine necessary sample sizes.

    • Example: Monitoring temperature in factory settings based on production complexity.

  2. Aggregation:

    • Combining features based on analysis purpose.

    • Example:

      • Exam data aggregated by school for institution performance (e.g., summing results).

      • Critique on generic aggregation in drug trials resulting in follow-on decisions detrimental to specific groups.

    • Analysts need guidance from business on aggregation strategy to avoid misinterpretations.

The Order of Stages

  • Data analysis processes are in constant flux, determined largely by storage methods:

    • ETL: Extraction, Transformation, Loading into data warehouses.

    • ELT: Extract, Load, Transform for data lakes.

The Power of Artificial Intelligence (AI)

Definition of AI
  • Describes computer use for tasks traditionally performed by human cognition. This includes processing large data volumes rapidly.

  • Automated decision-making raises issues; however, algorithms function within human-defined parameters.

Key Capabilities Transforming Data Analytics
  1. Object Recognition:

    • AI classifies images based on visual clues (e.g., photos, videos).

  2. Natural Language Processing (NLP):

    • AI analyzes and interprets language, adapting to various accents and contexts.

  3. Human-AI Interaction:

    • Used in customer service through chatbots and emotion detection in calls.

  4. Machine Learning:

    • Algorithms develop independently from instructions, improving over time based on vast data sets.

Types of Machine Learning
  1. Supervised Learning

    • Data is tagged for classification into categories.

    • Examples:

      • Classification Types:

      • Binary (e.g., spam vs. legitimate emails)

      • Multi-class (e.g., customer categories)

      • Multi-label (e.g., books with multiple genres)

    • Steps involved in the supervised learning process:

    1. Labeling of data.

    2. Training the machine with subsets of data to identify features.

    3. Learning with feedback for accuracy.

    4. Application of rules to classify new data.

    • Metrics: Precision and Recall concepts applied to assess outcome effectiveness.

  2. Regression Analysis:

    • Used for predictive analysis considering numerous variables.

    • Examples to apply in business:

      • Advertising strategies considering demographic traffic, weather patterns, among others.

  3. Challenges: Overfitting and Underfitting:

    • Overfitting: Creating perfect rules for dataset specifics, lacking generalization.

    • Underfitting: Inadequate training results in inability to discern relationships.

  • Input from business professionals can help resolve these issues.

Unsupervised Learning

Definition
  • Machine analyzes unlabelled datasets to discover patterns without specific prompts.

Techniques
  1. Cluster Analysis:

    • Groups data based on shared characteristics (e.g., customer queries).

  2. Anomaly Detection:

    • Identifies outliers which may indicate fraud or other concerns.

  3. Association Rules Mining:

    • Discovers relationships using logical inference for predictive insights (e.g., market basket analysis, health symptom identification).

Reinforcement Learning

  • Mimics behavioral training through positive reinforcement for achieving set objectives.

  • Algorithms adjust based on performance outcomes, enhancing their functionality through repetitive processes.

Setting Parameters

  • Importance of defining operational bounds for AI systems to prevent unintended consequences.

  • Businesses need to anticipate potential paths AI might take to ensure outcomes align with ethical and operational standards.

Key Roles in Data Analytics

  1. Data Scientists:

    • Specialize in data mining, modeling processes, and developing predictive algorithms.

  2. Data Analysts:

    • Focus on answering business questions through detailed data evaluation and interpretations.

  3. Business Analysts:

    • Bridge between business needs and technical data solutions to ensure project alignment.

  4. Finance Professionals:

    • Integral in ensuring data accuracy, ethical considerations, and effective communication of findings.

Conclusion

  • Overview of focusing on data collection, preparation, analytical methodology, and AI's role in data-driven decision-making.

  • Transition to discuss data's influence on business decision making in upcoming sections of the course.