fbla of doom and despair
Probability and Statistics Foundations
Calculate mean, median, mode, range of a dataset
Discuss variance measures (standard deviation, variance, covariance)
Characteristics and importance of Gaussian distribution
Calculate expected value of a random variable
Differentiate between continuous and discrete variables
Data Analysis and Statistics for AI
Select appropriate visual medium for datasets
Describe diagrams (boxplots, histograms, scatterplots)
Techniques for multivariate data (dependence methods, regression)
Importance of data cleaning
Factors affecting data quality (duplicates, low-quality sources)
Application of data science algorithms (linear regression, decision trees)
Tools for Data and AI
Write SQL queries
Common packages/libraries (Pandas, NumPy, PyTorch)
Use Python for data cleaning and wrangling
Use R for data science
Characteristics of relational databases
AI Basics
Nature of generative AI
Capabilities and limitations of generative AI
Uses of generative AI (healthcare, research, digital art)
AI subfields (computer vision, NLP, robotics)
Define large language models (LLMs)
Capabilities of large language models
Machine Learning
Nature of machine learning
Use of training, test, validation datasets
Behavior of machine learning algorithms (neural networks, decision trees)
Characterize unsupervised, supervised, reinforcement learning
Select appropriate algorithm for reasoning problems
Explain deep learning concept
Perception, Representation, and Reasoning
Use of predicate logic in AI models
Examples of predicate logic
Differences between logic-based and probability-based reasoning
Describe Bayesian networks (nodes, edges, Directed Acyclic Graphs)
Knowledge representation and reasoning in AI
Privacy and Ethics
Dilemmas from AI systems (self-driving vehicles, generative AI)
AI bias (algorithmic bias)
Security and privacy risks of LLMs
Credibility concerns of LLMs (hallucinations, misinformation)
Data Literacy and Foundations
Nature of data science
Differences between structured and unstructured data
Identify numeric and categorical data
Convert between data representations (binary, hexadecimal, decimal)
Types of data from various sources
Importance of data wrangling and transformation
Stages of the data science process
1. Probability and Statistics Foundations
Measures of Central Tendency:
Mean:
Median: Middle value of a sorted dataset.
Mode: Most frequent value.
Range:
Variance Measures:
Variance (): Average squared deviation from the mean:
Standard Deviation ():
Covariance: Measures how two variables change together.
Gaussian (Normal) Distribution: Bell-shaped, symmetric; defined by mean () and standard deviation (). of data falls within , within .
Expected Value (): For discrete variables, .
Variables: Discrete (countable, e.g., number of students) vs. Continuous (measurable, e.g., height).
2. Data Analysis and Statistics for AI
Visual Media:
Boxplots: Show distribution through quartiles and detect outliers.
Histograms: Show frequency distribution of a single variable.
Scatterplots: Visualize relationships/correlations between two variables.
Multivariate Techniques: Regression (predicting continuous outcomes) and dependence methods.
Data Quality: Cleaning involves removing duplicates and handling low-quality sources to prevent "garbage in, garbage out."
3. Tools for Data and AI
SQL: Language for querying and managing relational databases.
Python Ecosystem:
Pandas: Data manipulation and analysis.
NumPy: Numerical computing and arrays.
PyTorch: Deep learning and neural networks.
R: Primarily used for statistical computing and graphics.
Relational Databases: Organised into tables with predefined schemas.
4. AI Basics and Machine Learning
Generative AI: Focuses on creating new content (text, art, research) using Large Language Models (LLMs).
Machine Learning Types:
Supervised: Training on labeled data (mapping inputs to known outputs).
Unsupervised: Finding hidden patterns in unlabeled data (e.g., clustering).
Reinforcement Learning: Learning through rewards and penalties.
Deep Learning: A subset of ML based on multi-layered neural networks.
Datasets: Training (learning), Validation (tuning), Test (final evaluation).
5. Representation and Reasoning
Predicate Logic: Uses variables and quantifiers to express facts ().
Bayesian Networks: Probabilistic graphical models representing variables and their conditional dependencies via Directed Acyclic Graphs (DAGs).
Reasoning: Logic-based (deterministic rules) vs. Probability-based (dealing with uncertainty).
6. Ethics and Data Literacy
Key Risks: Algorithmic bias, privacy leakage in LLMs, and hallucinations (outputting factually incorrect information).
Data Types: Structured (tabular) vs. Unstructured (text, audio, video). Categorical (labels) vs. Numeric (counts/measurements).
Data Conversion: Binary (Base 2), Hexadecimal (Base 16), Decimal (Base 10).
Processes: Data wrangling (transforming raw data) is a critical stage in the data science process.