1/85
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Data Science
Using data, statistics, and computing to gain insights and make decisions.
Data
Raw facts and numbers.
Information
Processed data with meaning.
Algorithm
Step-by-step procedure to solve a problem.
Model
A mathematical representation of a real-world process.
Define Problem
First step in data science: decide what question you're answering.
Collect Data
Gather data from sources.
Clean Data
Fix missing values, errors, and duplicates.
Explore Data
Use charts and stats to look for patterns.
Build Model
Fit algorithms to data.
Evaluate Model
Measure accuracy, precision, recall, etc.
Communicate Results
Share insights clearly.
Qualitative Data
Descriptive categories (colors, names).
Quantitative Data
Numerical values.
Discrete Data
Countable integers.
Continuous Data
Real-number values.
Nominal Data
Categories with no order (eye color).
Ordinal Data
Ordered categories (rankings).
Interval Data
Numeric, no true zero (°C).
Ratio Data
Numeric, true zero (height, money).
Mean
Average.
Median
Middle value.
Mode
Most frequent value.
Range
Max minus min.
Variance
Measure of spread.
Standard Deviation
Spread of data; square root of variance.
Correlation
Strength/direction of relationship (−1 to +1).
Correlation does NOT imply causation
Relationship does not mean one causes the other.
Bar Chart
Compares categories visually.
Histogram
Shows distribution of numerical data.
Scatter Plot
Shows relationship between two numeric variables.
Line Chart
Shows trends over time.
Box Plot
Shows quartiles and outliers.
Database
Organized collection of data.
Table
Rows and columns storing data.
Primary Key
Unique identifier for each row.
Foreign Key
Field linking one table to another.
SELECT
SQL command to choose columns.
FROM
SQL command to choose table.
WHERE
SQL command to filter rows.
ORDER BY
SQL command to sort rows.
JOIN
SQL command to combine tables.
Artificial Intelligence (AI)
Machines performing tasks requiring human intelligence.
Machine Learning (ML)
Algorithms that learn patterns from data.
Deep Learning
ML using neural networks with many layers.
Supervised Learning
Labeled data; classification and regression.
Unsupervised Learning
Unlabeled data; clustering or dimensionality reduction.
Reinforcement Learning
Learning through rewards and punishments.
Linear Regression
Predicts numeric values.
Logistic Regression
Classification algorithm (yes/no).
Decision Tree
Tree-based decisions.
Random Forest
Many decision trees combined.
SVM (Support Vector Machine)
Finds best separating boundary.
k-NN (k Nearest Neighbors)
Predicts from closest examples.
k-Means Clustering
Unsupervised clustering algorithm.
PCA (Principal Component Analysis)
Reduces number of features.
Neural Network
Model with layers of connected neurons.
Input Layer
Takes in features.
Hidden Layer
Learns patterns.
Output Layer
Produces final prediction.
Weights
Values the model learns.
Backpropagation
Method neural nets use to adjust weights.
Accuracy
Correct predictions / total predictions.
Precision
TP / (TP + FP).
Recall
TP / (TP + FN).
Confusion Matrix
Table of true/false positives/negatives.
MAE (Mean Absolute Error)
Average absolute difference from true values.
MSE (Mean Squared Error)
Measures error squared.
R² Score
How well a regression fits the data.
Remove Duplicates
Delete repeated rows.
Handle Missing Values
Fill with mean/median or drop rows.
Normalize
Scale values to similar ranges.
Standardize
Convert data to mean 0, SD 1.
One-Hot Encoding
Turn categories into 0/1 columns.
Outlier Detection
Identify extreme values.
AI Bias
Unfair outcomes for certain groups.
AI Transparency
Ability to explain model decisions.
Data Privacy
Protecting user data.
AI Accountability
Responsibility for system actions.
Fairness in AI
Equal treatment regardless of group.
Overfitting
Model memorizes training data, performs poorly on new data.
Underfitting
Model too simple; misses patterns.
Training Set
Data model learns from.
Test Set
Data used to evaluate performance.
Feature
Input variable.
Label
Target variable you're predicting.