1/92
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Mean
The average of a dataset (sum divided by count).
Median
The middle value in an ordered dataset.
Mode
The most frequent value in a dataset.
Range
The difference between the highest and lowest values.
Variance
A measure of data spread from the mean.
Standard deviation
The square root of variance; typical distance from mean.
Covariance
Measures how two variables change together.
Gaussian distribution
A normal bell-shaped distribution where mean = median = mode.
Expected value
The sum of each outcome times its probability.
Continuous variable
A variable that can take any real value in a range.
Discrete variable
A variable with countable values.
Histogram
A chart that shows the distribution of data using bins.
Scatterplot
A graph showing the relationship between two numeric variables.
Boxplot
A visual of median, quartiles, and outliers.
Appropriate visual medium
Choosing the best chart to represent data clearly.
Multivariate data
Data involving more than two variables.
Multiple linear regression
Predicting a numeric value using multiple variables.
Logistic regression
Predicting categories (yes/no).
Data cleaning
Fixing issues like duplicates, missing values, or formatting errors.
Data quality issues
Problems like incomplete, duplicate, or low-quality data.
k-means
An unsupervised algorithm that creates clusters.
Decision tree
A model that predicts using rule-based splits.
Linear regression
A model that fits a line to predict a numeric value.
SQL
A language used to query and manage databases.
SELECT statement
SQL command to choose columns.
WHERE clause
SQL filter that picks rows matching conditions.
Pandas
Python library for data cleaning and manipulation.
NumPy
Python library for math and arrays.
PyTorch
A deep learning framework.
Python for data wrangling
Using Python to clean and prep datasets.
R programming language
A language used heavily in statistics.
Relational database
A database storing data in tables with relationships.
Primary key
A unique identifier for each row in a table.
Generative AI
AI that can create new content like text, images, or audio.
Capabilities of generative AI
Summaries, translations, coding, image generation.
Limitations of generative AI
Bias, hallucinations, inaccuracies.
Uses of generative AI
Healthcare, research, digital art, productivity.
Computer vision
AI that understands images and video.
Natural language processing (NLP)
AI that processes human text and speech.
Human-computer interaction
How humans interact with AI systems.
Robotics
AI that controls or automates machines.
Large language model (LLM)
A model trained on massive text datasets.
LLM capabilities
Answering questions, reasoning, summarizing, coding.
Machine learning
AI that learns patterns from data.
Training dataset
Data the model learns from.
Validation dataset
Data used to tune hyperparameters.
Test dataset
Data used to evaluate model performance.
Supervised learning
Learning from labeled data.
Unsupervised learning
Learning from unlabeled data.
Reinforcement learning
Learning through rewards and punishments.
Neural network
A model that learns complex patterns through layers.
Decision tree algorithm
A supervised model that splits data based on rules.
Deep learning
Neural networks with many layers that learn features.
Predicate logic
A symbolic way to represent facts and relationships.
Example of predicate logic
Human(Sam), Loves(Sam, Pizza).
Logic-based reasoning
Uses strict rules and true/false logic.
Probability-based reasoning
Uses uncertainty and likelihoods.
Bayesian network
A probabilistic graph of cause-effect relationships.
Node (Bayesian network)
A variable in the network.
Edge (Bayesian network)
A directional connection showing influence.
Directed acyclic graph (DAG)
A graph with no loops.
Knowledge representation
How AI stores info to reason with.
Reasoning in AI
Drawing conclusions using logic or probability.
AI dilemmas
Ethical issues like self-driving decisions or surveillance.
Algorithmic bias
Bias inherited from unfair training data.
How AI inherits bias
Through biased data, labeling, or representation.
Security risks of LLMs
Models may leak or store sensitive info.
Privacy risks of LLMs
Sensitive data may be exposed or memorized.
Hallucinations (AI)
When AI generates false information.
Misinformation from AI
Inaccurate or misleading outputs.
Surveillance concerns
AI used to track or monitor people.
Bias in generative models
Unfair or skewed outputs caused by biased data.
Ethical AI
AI systems that are fair, safe, transparent, and responsible.
Data science
Using data to answer questions or make predictions.
Structured data
Data stored in clean tables with rows and columns.
Unstructured data
Text, images, audio without organized format.
Numeric data
Quantitative values like height or cost.
Categorical data
Labels like color, brand, category.
Binary
Base-2 number system.
Hexadecimal
Base-16 number system.
Decimal
Base-10 number system.
Binary to decimal
Convert using powers of 2.
Decimal to binary
Break number into powers of 2.
Data sources
Sensors, logs, surveys, websites.
Data wrangling
Cleaning and preparing raw data.
Data transformation
Changing data format or structure.
Data science process: ask
Define the question.
Data science process: collect
Gather the needed data.
Data science process: clean
Fix errors and prep the data.
Data science process: analyze
Explore and understand data.
Data science process: model
Build predictions or insights.
Data science process: interpret
Explain what results mean.
Data science process: communicate
Share findings clearly.