1/70
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Standard Deviation/What does larger values mean?
A measure of data spread; large values = high variability.
Variance of 0
All values are identical.
Gaussian Distribution
Bell curve.
% within 1 SD in normal distribution
68%.
Continuous Variable
Height, temperature.
Discrete Variable
Number of students.
Expected Value of Die Roll
3.5
Importance of Gaussian Distribution
Used in modeling errors and natural variation.
68-95-99.7 Rule
Percent within 1, 2, 3 SDs.
Outliers Affect Mean
Skew results upward/downward.
Parameter vs Statistic
Parameter = population, Statistic = sample.
Positive Covariance
Variables increase together.
Best Visualization for Distribution
Histogram or boxplot.
Best Chart for Categories
Bar chart.
Boxplot vs Histogram
Boxplot = summary stats; histogram = frequency distribution.
Multivariate Data Example
Data with multiple features, e.g., height and weight.
Multiple Linear Regression Use
Predict continuous outcomes.
Why Clean Data
Removes errors, improves accuracy.
Duplicate Record Issue
Distorts analysis results.
Low-Quality Data Issue
Produces unreliable AI results.
Imputation Meaning
Filling in missing data.
Classification Algorithms
Decision trees, logistic regression, SVM.
Decision Trees Use
Model decision paths.
Feature Selection Role
Removes irrelevant data to improve performance.
Correlation vs Causation
Correlation = relationship, causation = direct effect.
Data Normalization Importance
Scales data for fair model comparison.
Real-World Data Science Use
Predict sales, detect fraud, diagnose diseases.
Purpose of SQL
Query and manage databases.
SQL Query for All Employees
SELECT * FROM employees;
Library for Data Manipulation
Pandas.
NumPy Used For
Numerical computation.
Pandas Used For
DataFrames, data manipulation.
PyTorch Used For
Deep learning models.
Data Wrangling
Transforming and cleaning raw data.
R vs Python
R = stats focus; Python = general-purpose.
Relational Database
Organizes data in linked tables.
Primary key
Unique identifier per record.
Generative AI
AI that creates new data (text, images, etc.).
Generative AI examples
ChatGPT, DALL·E.
Limitation of generative AI
Can generate incorrect or biased content.
Ethical concern
Deepfakes, misinformation.
Computer vision
AI for image/video understanding.
NLP used for
Language processing tasks.
LLM
Large Language Model.
Hallucination in LLMs
AI generating false or made-up info.
Bias in LLMs
Reflects biased training data.
Symbolic vs neural AI
Symbolic = rules, Neural = learned patterns.
Machine learning
AI that learns from data to make predictions.
Training vs testing vs validation
Train = learn, Test = evaluate, Validation = tune.
Supervised learning feature
Labeled data.
Unsupervised learning example
K-means clustering.
Reinforcement learning
Learning via rewards/punishments.
Real-world reinforcement example
Game-playing AI, robotics.
Neural network
Layers of interconnected nodes mimicking the brain.
Deep learning
Neural networks with many layers.
Overfitting
Model memorizes data; prevented with regularization.
Learning rate
Controls speed of model updates.
Speech recognition
Technology that converts spoken language into text.
Knowledge representation
A method for encoding information for AI reasoning.
Reasoning in AI
Using logic or inference to draw conclusions from data.
Example of reasoning in AI
Diagnosing diseases based on symptoms.
Data privacy
Protecting personal data from unauthorized access.
AI transparency importance
Ensures trust and accountability in AI systems.
Algorithmic bias
Systematic unfairness due to biased data or design.
AI ethics
Guidelines for responsible development and use of AI.
Example of AI ethical concern
Facial recognition used without consent.
Data literacy
Ability to read, work with, analyze, and argue with data.
Raw data
Unprocessed information collected from sources.
Dataset
A structured collection of related data.
Data visualization importance
Helps interpret patterns and insights easily.
Metadata
Data describing other data, such as format or source.