1/63
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Mean
Average of all values
most affected by outliers
Median
Middle value of a dataset
Mode
Most frequently occurring value
Range
Difference between maximum and minimum values
Standard deviation
Measure of spread of data from the mean
Variance
Square of standard deviation
Covariance
Measure of how two variables change together
Normal distribution
Bell-shaped symmetric distribution defined by mean and standard deviation
Expected value
Average outcome of a random variable over time
Continuous variable
Variable with infinite possible values such as height
Discrete variable
Countable variable such as number of items
Complement of an event
The probability the event does not occur
Histogram
Chart showing frequency distribution of data
Boxplot
Graph displaying median quartiles and outliers
Scatterplot
Graph showing relationship between two variables
Overfitting
Model learns noise instead of patterns and performs poorly on new data
Underfitting
Model is too simple and doesn’t learn important patterns
Confusion matrix
Table used to evaluate classification models
Precision
How many predicted positives were correct
Recall
How many actual positives were identified correctly
Coefficient of variation
Spread measure useful for comparing datasets with different units
Data cleaning
Correcting or removing inaccurate inconsistent duplicated or missing data
Data normalization
Scaling features into a similar range
Regression
Predicts continuous numerical outcomes
Classification
Predicts categories or labels
SQL
Language used to manage and query relational databases
Primary key
Unique identifier for a database record
Pandas
Python library for data tables and data analysis
NumPy
Python library for numerical computing and arrays
Matplotlib
Python library for data visualization
TensorFlow
Framework for building and training deep learning models
Jupyter Notebook
Interactive tool for writing and running data science code in the browser
Artificial intelligence
Computers performing tasks that normally require human intelligence
Generative AI
AI that creates new content such as text images or audio
AI subfields
Categories including computer vision NLP robotics and human-AI interaction
Large language model
AI system trained to predict and generate text
Supervised learning
Learning based on labeled input and output data
Unsupervised learning
Learning patterns or groups from unlabeled data
Reinforcement learning
Learning through rewards and penalties while interacting with an environment
Training data
Data used to teach a model patterns
Validation data
Data used to tune model hyperparameters
Test data
Data used to evaluate final model performance
Neural network
Model made of interconnected nodes that learn complex patterns
K-means clustering
Unsupervised algorithm that groups similar data points
Deep learning
Machine learning using large multi-layer neural networks
Predicate logic
Logical statements computers can interpret and reason with
Bayesian network
Probabilistic model using nodes and directed edges to represent relationships
Knowledge representation
Storing information in ways computers can understand and reason about
Expert system
AI that uses rules and reasoning to solve problems
Algorithmic bias
Unfair patterns in AI output caused by biased training data
GDPR
European Union law protecting data privacy and data usage transparency
Transparency
Clearly informing users how data is collected processed and used
Data literacy
Ability to read work with analyze and communicate data
Structured data
Data organized in tables and fixed formats
Unstructured data
Data without a fixed format such as images text or emails
Qualitative data
Descriptive non-numeric information
Quantitative data
Numerical measurable information
Streaming data
Real-time continuously arriving data
Batch data
Data processed in groups at scheduled intervals
Metadata
Data that describes other data
Data wrangling
Transforming and preparing raw data for analysis
Data science process
Steps including collecting cleaning analyzing modeling and interpreting data