Data Science & Machine Learning: Key Concepts and Tools

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/132

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

133 Terms

1
New cards

Logistic Regression

Predicts binary outcomes like yes/no or 0/1. (Like deciding if an email is spam.)

2
New cards

Decision Tree

Splits data into branches based on rules. (Like asking "Is it raining?" then deciding to bring an umbrella.)

3
New cards

Random Forest

Collection of decision trees voting together. (Like asking multiple friends for advice.)

4
New cards

Overfitting

Model memorizes data instead of learning patterns. (Like memorizing practice questions, not concepts.)

5
New cards

Cross-Validation

Tests a model with different train/test splits. (Like practicing a speech with multiple audiences.)

6
New cards

Statistics

Study of collecting, analyzing, and interpreting data. (Like summarizing a classroom's test scores.)

7
New cards

Probability

Likelihood of an event occurring between 0 and 1. (Like flipping a coin and predicting heads.)

8
New cards

Hypothesis Testing

Determines if a result is due to chance or a real effect. (Like testing if a new study method works.)

9
New cards

Correlation

Measures how strongly two variables move together. (Like linking hours of sleep to energy levels.)

10
New cards

Linear Regression

Predicts a numeric value using a straight-line relationship. (Like estimating house price by size.)

11
New cards

Feature Engineering

Creating new variables to improve models. (Like combining height and weight into BMI.)

12
New cards

Regularization

Penalizes large weights to prevent overfitting. (Like discouraging overcomplicated answers.)

13
New cards

PCA

Reduces data dimensions while keeping patterns. (Like turning a long movie into a short trailer.)

14
New cards

Python

Versatile language for data science and ML. (Like a Swiss Army knife for coding.)

15
New cards

R

Language for statistics and visualization. (Like Excel but programmable.)

16
New cards

SQL

Queries and manages structured databases. (Like asking a librarian for specific books.)

17
New cards

Bash/Shell

Automates commands and tasks. (Like giving your computer a to-do list.)

18
New cards

Java

Common backend and big-data language. (Like a sturdy foundation for big apps.)

19
New cards

NumPy

Fast numerical computation in Python. (Like a high-speed calculator.)

20
New cards

pandas

Data organization and analysis in Python. (Like coding with spreadsheets.)

21
New cards

matplotlib

Makes static charts. (Like drawing graphs with code.)

22
New cards

seaborn

Stylish, statistical visualizations. (Like decorating charts.)

23
New cards

scikit-learn

Library for ML algorithms. (Like a toolbox for prediction.)

24
New cards

TensorFlow

Deep learning framework by Google. (Like building neural nets with LEGO.)

25
New cards

PyTorch

Flexible deep learning library by Meta. (Like an experimental lab for AI.)

26
New cards

Keras

Simplified deep learning interface. (Like a friendly front-end for TensorFlow.)

27
New cards

Statsmodels

Statistical modeling in Python. (Like R's regression functions.)

28
New cards

SciPy

Scientific computing and optimization. (Like a scientific calculator.)

29
New cards

XGBoost

Gradient boosting library for fast ML. (Like turbocharging predictions.)

30
New cards

LightGBM

Lightweight, fast boosting model. (Like XGBoost's speedy cousin.)

31
New cards

CatBoost

Boosting library that handles categorical data. (Like XGBoost that understands text labels.)

32
New cards

Data Cleaning

Fixing or removing incorrect or missing data. (Like tidying a messy spreadsheet.)

33
New cards

Missing Value Imputation

Filling blanks with averages or predictions. (Like guessing skipped test answers.)

34
New cards

Outlier Detection

Finding extreme or unusual values. (Like spotting a runner at 100 mph.)

35
New cards

One-Hot Encoding

Turning categories into 0/1 columns. (Like checkboxes for ice cream flavors.)

36
New cards

Label Encoding

Assigning numbers to categories. (Like small=1, medium=2, large=3.)

37
New cards

Normalization

Scaling data to 0-1. (Like matching different music volumes.)

38
New cards

Standardization

Centering data to mean 0, std 1. (Like grading on a curve.)

39
New cards

ETL

Extract, Transform, Load process. (Like shopping, washing, and storing groceries.)

40
New cards

API Data Retrieval

Pulling data via web APIs. (Like ordering takeout and getting a meal back.)

41
New cards

Web Scraping

Collecting data from websites. (Like copying all prices from a store automatically.)

42
New cards

Data Integration

Combining sources into one view. (Like merging grades and attendance into one record.)

43
New cards

Data Transformation

Changing data format or shape. (Like rearranging ingredients before cooking.)

44
New cards

SQL Databases

Store structured tables of data. (Like labeled filing cabinets.)

45
New cards

NoSQL Databases

Handle flexible, unstructured data. (Like a box of sticky notes with loose rules.)

46
New cards

MySQL/PostgreSQL

Popular SQL database systems. (Like reliable file cabinets.)

47
New cards

MongoDB

NoSQL document database. (Like digital folders with flexible fields.)

48
New cards

Hadoop

Distributed framework for large data storage. (Like splitting a huge book among friends.)

49
New cards

Spark

Engine for fast distributed processing. (Like Hadoop's faster cousin.)

50
New cards

Databricks

Collaborative platform around Spark. (Like Google Docs for big-data code.)

51
New cards

Airflow

Automates and schedules data workflows. (Like a calendar that runs scripts for you.)

52
New cards

Kafka

Real-time data streaming system. (Like a conveyor belt for continuous updates.)

53
New cards

Snowflake

Cloud data warehouse. (Like an online filing cabinet with unlimited space.)

54
New cards

BigQuery

Google's serverless warehouse. (Like querying billions of rows instantly.)

55
New cards

AWS S3

Amazon's cloud storage. (Like an infinite online drive.)

56
New cards

ETL Pipelines

Automated data movement processes. (Like an assembly line for cleaning and storing data.)

57
New cards

Tableau

Interactive dashboards for storytelling. (Like turning spreadsheets into visual stories.)

58
New cards

Power BI

Microsoft tool for business reports. (Like Tableau inside the Office suite.)

59
New cards

matplotlib

Static Python charting library. (Like drawing with precise tools.)

60
New cards

seaborn

Statistical visualization library. (Like adding style to graphs.)

61
New cards

Plotly

Interactive plotting library. (Like hoverable, zoomable charts.)

62
New cards

ggplot2

R library for layered plots. (Like building art with data.)

63
New cards

Looker

Cloud BI and data exploration tool. (Like real-time dashboards for teams.)

64
New cards

D3.js

JS library for web visuals. (Like coding interactive art in a browser.)

65
New cards

Excel/Sheets Charts

Basic built-in graphing. (Like sketching drafts before full dashboards.)

66
New cards

AWS

Cloud platform for computing, ML, and storage. (Like renting virtual computers.)

67
New cards

AWS S3

Cloud storage for any data type. (Like an online hard drive.)

68
New cards

AWS EC2

Virtual servers for running apps. (Like borrowing a supercomputer.)

69
New cards

AWS Lambda

Serverless code execution. (Like lights turning on automatically.)

70
New cards

AWS SageMaker

ML development platform. (Like a full lab for model training.)

71
New cards

GCP

Google's cloud service. (Like running apps on Google's servers.)

72
New cards

BigQuery

Serverless SQL data warehouse. (Like analyzing data at lightning speed.)

73
New cards

Vertex AI

Managed ML platform by Google. (Like an AI workshop that sets itself up.)

74
New cards

Azure

Microsoft's cloud environment. (Like Windows in the cloud.)

75
New cards

Azure ML Studio

Drag-and-drop ML builder. (Like Lego blocks for models.)

76
New cards

Azure Synapse

Data integration and analytics. (Like joining multiple lakes into one.)

77
New cards

Docker

Packages apps and dependencies. (Like a lunchbox that works anywhere.)

78
New cards

Kubernetes

Manages many Docker containers. (Like an orchestra conductor.)

79
New cards

Flask

Lightweight Python web framework. (Like a simple café website in code.)

80
New cards

FastAPI

High-speed API framework. (Like Flask but faster.)

81
New cards

Streamlit

Turns Python scripts into web apps. (Like instant dashboards.)

82
New cards

MLflow

Tracks ML experiments and models. (Like a lab notebook for ML runs.)

83
New cards

MLOps

Managing ML model lifecycle. (Like DevOps for data models.)

84
New cards

CI/CD

Automates testing and deployment. (Like a conveyor belt for code.)

85
New cards

MLflow

Tracks experiments and versions. (Like logging every experiment.)

86
New cards

Kubeflow

Runs ML pipelines on Kubernetes. (Like an automated assembly line.)

87
New cards

DVC

Version control for data/models. (Like saving checkpoints in a game.)

88
New cards

Git/GitHub

Version control and collaboration. (Like Google Docs for code.)

89
New cards

Jenkins

Automates build/test pipelines. (Like a robot that checks every change.)

90
New cards

Airflow

Schedules and monitors data tasks. (Like autopilot for workflows.)

91
New cards

Docker

Packages ML environments. (Like shipping your model in a safe box.)

92
New cards

Kubernetes

Scales and manages containers. (Like traffic control for servers.)

93
New cards

Excel

Spreadsheet for data analysis. (Like a digital notebook of formulas.)

94
New cards

Google Sheets

Collaborative spreadsheets. (Like shared Excel in the cloud.)

95
New cards

Tableau

Visual dashboards for insights. (Like colorful stories from data.)

96
New cards

Power BI

Microsoft BI dashboards. (Like Tableau with Excel integration.)

97
New cards

Google Data Studio

Free Google dashboard builder. (Like linking Analytics and Sheets visually.)

98
New cards

SAS

Enterprise analytics platform. (Like R + Python for corporations.)

99
New cards

SPSS

Statistical tool for research. (Like point-and-click regression.)

100
New cards

Alteryx

No-code data prep and blending tool. (Like puzzle pieces that snap together.)