fbla of doom and despair

Probability and Statistics Foundations

  • Calculate mean, median, mode, range of a dataset

  • Discuss variance measures (standard deviation, variance, covariance)

  • Characteristics and importance of Gaussian distribution

  • Calculate expected value of a random variable

  • Differentiate between continuous and discrete variables

Data Analysis and Statistics for AI

  • Select appropriate visual medium for datasets

  • Describe diagrams (boxplots, histograms, scatterplots)

  • Techniques for multivariate data (dependence methods, regression)

  • Importance of data cleaning

  • Factors affecting data quality (duplicates, low-quality sources)

  • Application of data science algorithms (linear regression, decision trees)

Tools for Data and AI

  • Write SQL queries

  • Common packages/libraries (Pandas, NumPy, PyTorch)

  • Use Python for data cleaning and wrangling

  • Use R for data science

  • Characteristics of relational databases

AI Basics

  • Nature of generative AI

  • Capabilities and limitations of generative AI

  • Uses of generative AI (healthcare, research, digital art)

  • AI subfields (computer vision, NLP, robotics)

  • Define large language models (LLMs)

  • Capabilities of large language models

Machine Learning

  • Nature of machine learning

  • Use of training, test, validation datasets

  • Behavior of machine learning algorithms (neural networks, decision trees)

  • Characterize unsupervised, supervised, reinforcement learning

  • Select appropriate algorithm for reasoning problems

  • Explain deep learning concept

Perception, Representation, and Reasoning

  • Use of predicate logic in AI models

  • Examples of predicate logic

  • Differences between logic-based and probability-based reasoning

  • Describe Bayesian networks (nodes, edges, Directed Acyclic Graphs)

  • Knowledge representation and reasoning in AI

Privacy and Ethics

  • Dilemmas from AI systems (self-driving vehicles, generative AI)

  • AI bias (algorithmic bias)

  • Security and privacy risks of LLMs

  • Credibility concerns of LLMs (hallucinations, misinformation)

Data Literacy and Foundations

  • Nature of data science

  • Differences between structured and unstructured data

  • Identify numeric and categorical data

  • Convert between data representations (binary, hexadecimal, decimal)

  • Types of data from various sources

  • Importance of data wrangling and transformation

  • Stages of the data science process

1. Probability and Statistics Foundations
  • Measures of Central Tendency:

    • Mean: μ=xin\mu = \frac{\sum x_i}{n}

    • Median: Middle value of a sorted dataset.

    • Mode: Most frequent value.

    • Range: MaxMin\text{Max} - \text{Min}

  • Variance Measures:

    • Variance (σ2\sigma^2): Average squared deviation from the mean: σ2=(xiμ)2n\sigma^2 = \frac{\sum (x_i - \mu)^2}{n}

    • Standard Deviation (σ\sigma): σ2\sqrt{\sigma^2}

    • Covariance: Measures how two variables change together.

  • Gaussian (Normal) Distribution: Bell-shaped, symmetric; defined by mean (μ\mu) and standard deviation (σ\sigma). 68%68\% of data falls within 1σ1\sigma, 95%95\% within 2σ2\sigma.

  • Expected Value (E[X]E[X]): For discrete variables, E[X]=x<em>iP(x</em>i)E[X] = \sum x<em>i P(x</em>i).

  • Variables: Discrete (countable, e.g., number of students) vs. Continuous (measurable, e.g., height).

2. Data Analysis and Statistics for AI
  • Visual Media:

    • Boxplots: Show distribution through quartiles and detect outliers.

    • Histograms: Show frequency distribution of a single variable.

    • Scatterplots: Visualize relationships/correlations between two variables.

  • Multivariate Techniques: Regression (predicting continuous outcomes) and dependence methods.

  • Data Quality: Cleaning involves removing duplicates and handling low-quality sources to prevent "garbage in, garbage out."

3. Tools for Data and AI
  • SQL: Language for querying and managing relational databases.

  • Python Ecosystem:

    • Pandas: Data manipulation and analysis.

    • NumPy: Numerical computing and arrays.

    • PyTorch: Deep learning and neural networks.

  • R: Primarily used for statistical computing and graphics.

  • Relational Databases: Organised into tables with predefined schemas.

4. AI Basics and Machine Learning
  • Generative AI: Focuses on creating new content (text, art, research) using Large Language Models (LLMs).

  • Machine Learning Types:

    • Supervised: Training on labeled data (mapping inputs to known outputs).

    • Unsupervised: Finding hidden patterns in unlabeled data (e.g., clustering).

    • Reinforcement Learning: Learning through rewards and penalties.

  • Deep Learning: A subset of ML based on multi-layered neural networks.

  • Datasets: Training (learning), Validation (tuning), Test (final evaluation).

5. Representation and Reasoning
  • Predicate Logic: Uses variables and quantifiers to express facts (P(x)P(x)).

  • Bayesian Networks: Probabilistic graphical models representing variables and their conditional dependencies via Directed Acyclic Graphs (DAGs).

  • Reasoning: Logic-based (deterministic rules) vs. Probability-based (dealing with uncertainty).

6. Ethics and Data Literacy
  • Key Risks: Algorithmic bias, privacy leakage in LLMs, and hallucinations (outputting factually incorrect information).

  • Data Types: Structured (tabular) vs. Unstructured (text, audio, video). Categorical (labels) vs. Numeric (counts/measurements).

  • Data Conversion: Binary (Base 2), Hexadecimal (Base 16), Decimal (Base 10).

  • Processes: Data wrangling (transforming raw data) is a critical stage in the data science process.