1/208
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Accuracy
The closeness of a measured or predicted value to the true value. High accuracy means minimal systematic error.;
Accuracy Score
An evaluation metric showing the ratio of correct predictions to total predictions in classification tasks.;
Activation Function
A nonlinear function in neural networks that transforms a neuron's input into its output and determines whether it ‘fires’.;
Algorithm
A sequence of repeatable, often mathematical steps that transforms inputs to outputs; in ML, takes data + hyperparameters and yields predictions.;
Analysis of Variance (ANOVA)
A statistical method to test whether three or more groups have the same mean, revealing whether observed differences are significant.;
Apache Spark
An open-source parallel processing framework for big data; distributes data and computation across cluster nodes for speed and scale.;
API (Application Programming Interface)
A software bridge allowing programs/services to communicate (e.g., using a web API to fetch data or serve models).;
Artificial Intelligence (AI)
A broad field aiming to create systems that perform tasks normally requiring human intelligence, such as language understanding or image recognition.;
Artificial Neural Networks (ANN)
Neural-network ML models with input, hidden, and output layers; the foundation of deep learning.;
Backpropagation
The process by which neural networks adjust internal weights, moving backwards through the network to reduce prediction errors.;
Bayes' Theorem
A formula for conditional probability: P(A
B) ∝ P(B
A)·P(A), used widely in inference and classifiers.;
Bayesian Network
A probabilistic graphical model whose nodes are variables and edges encode conditional dependencies, supporting inference under uncertainty.;
Bias (Model Bias)
Systematic error from overly simple assumptions leading to underfitting; also refers to societal/algorithmic unfairness in AI systems.;
Bias-Variance Tradeoff
Balancing underfitting (high bias) and overfitting (high variance) to minimize total prediction error.;
Big Data
Datasets so large or complex that traditional data-processing methods are inadequate; often characterized by volume, velocity, variety (and often veracity and value).;
Binary Classification
A supervised ML task that predicts one of two possible outcomes (e.g., spam vs. not spam).;
Binomial Distribution
Discrete distribution for a fixed number of independent trials with constant success probability and two outcomes.;
Boxplot
A chart showing data distribution via quartiles, highlighting median, variability, and potential outliers.;
Business Analyst
Professional who connects data insights to actions for profitability/efficiency; often uses SQL and BI tools.;
Business Analytics (BA)
Using data (descriptive and predictive) to find insights and support business decisions.;
Business Intelligence (BI)
Descriptive analytics and reporting for monitoring and understanding business performance, often via dashboards.;
Categorical Variable
Variable with limited categories (no inherent order); e.g., color, marital status.;
Central Tendency
Measures (mean, median, mode) describing the typical value of a dataset.;
Classification
Supervised learning task predicting categorical labels from features.;
Clustering
An unsupervised ML method that groups similar data points without pre-labeled categories.;
Computer Science
Study of computation, algorithms, data structures, software/hardware systems, and applications.;
Computer Vision
Field enabling computers to interpret images/video (e.g., recognition, detection).;
Concept Drift
The relationship between inputs and target changes. Both degrade model performance and require monitoring/retuning.;
Confidence Interval (CI)
A range of plausible values for a population parameter derived from sample data.;
Confusion Matrix
2×2 (or multi-class) table summarizing classification predictions vs. actuals (TP, FP, TN, FN).;
Continuous Variable
Variable taking any value in a range (e.g., height, weight).;
Correlation
Strength/direction of linear relationship between variables; measured by coefficients like Pearson’s r.;
Cost Function
Objective to minimize during training (e.g., MSE, cross-entropy), measuring prediction error.;
Covariance
Measure of how two variables vary together; used in correlation calculations.;
Cross-Validation
Resampling strategy (e.g., k-fold) to estimate model performance on unseen data.;
CSV (Comma-Separated Values)
A simple text file format where each line is a record and fields are separated by commas.;
Dashboard
Interactive visual interface showing KPIs and metrics for monitoring/decision-making.;
Data Analysis (DA)
Cleaning, transforming, exploring, and visualizing data to extract insights.;
Data Analyst
Professional who analyzes data and reports insights using coding and BI tools.;
Data Anonymization
Techniques that irreversibly remove or generalize identifiers so individuals cannot be re-identified (e.g., k-anonymity, differential privacy). Stronger than pseudonymization.;
Data Augmentation (ML)
Creating additional training examples by transforming existing ones (e.g., rotating/cropping images, noise injection, text paraphrase) to boost model robustness without new data collection.;
Data Catalog
A searchable inventory of datasets with rich metadata (owners, lineage, sensitivity, freshness, quality scores) that helps people discover, trust, and reuse data.;
Data Cleaning
Correcting or removing inaccurate records, handling missing values, and formatting data for analysis.;
Data Creation
The act of generating new data—by sensing, logging, surveying, simulating, or deriving via computation (e.g., feature engineering).;
Data Consumer
Stakeholder who uses data insights to make decisions; collaborates with data teams.;
Data Decay
The gradual deterioration of data quality over time as facts change, formats drift, links rot, and records go stale.;
Data Dictionary
A definitive reference for each field/column (name, meaning, datatype, units, allowed values, calculation rules).;
Data Drift
Input feature distributions change over time.;
Data-Driven
An operating posture in which decisions are justified by measured evidence rather than intuition alone.;
Data Engineer
Builds/maintains data infrastructure and pipelines delivering clean, usable data.;
Data Engineering (DE)
Acquiring, organizing, and scaling data access via pipelines, storage, and transformation.;
Data Enrichment
Enhancing existing data with additional attributes/context for more value.;
Data Envelope
A container pattern that wraps a payload with headers/metadata for routing, integrity, and security.;
Data Ethics
Principles guiding responsible data use, including privacy, fairness, transparency, and accountability.;
Data Fabrication
The research-ethics term for making up data or results.;
Data Forecasting
Using time-series/statistical or ML models to project future values from historical data.;
Data Governance
Policies/roles/processes ensuring data quality, availability, integrity, and security across an organization.;
Data Hierarchy (DIKW)
The classic ladder: Data → Information → Knowledge → Wisdom.;
Data Hoarding
Accumulating data without purpose, documentation, or quality controls.;
Data Hygiene
The routine practices that keep data trustworthy and analysis-ready.;
Data Imbalance
A skewed target distribution where one class dominates (e.g., 99% non-fraud).;
Data Journalism
Using quantitative analysis to inform storytelling in journalism.;
Data Lake
Central storage of raw, unstructured/structured data from many sources for future use.;
Data Leakage
When information from outside the proper training window leaks into features.;
Data Lineage
End-to-end record of where data came from and how it changed.;
Data Literacy
Ability to read, analyze, communicate, and reason with data.;
Data Lookup
Retrieving values by key from a reference source (e.g., join to a code table, Excel VLOOKUP).;
Data Mart
A subject-specific slice of a data warehouse.;
Data Mining
Discovering patterns/relationships in large datasets.;
Data Modeling
Representing data structures/relationships or building predictive models.;
Data Observability
Continuous monitoring of data systems.;
Data Pipeline
Automated flow of data through extract/transform/load steps.;
Data Provenance
Documented origin and acquisition context.;
Data Pseudonymization
Replacing direct identifiers with tokens while keeping a re-link key separate.;
Data Quality
Fitness of data for intended use.;
Data Representation
How values are encoded and structured.;
Data Residency
The geographic/legal location where data are stored.;
Data Retention
Policies defining how long data are kept, why, and how they are securely deleted.;
Data Rights
Legal and normative rights individuals have over personal data.;
Data Science (DS)
Interdisciplinary field applying methods to collect, manage, analyze, and communicate from data.;
Data Science Cycle
The iterative process of problem definition, data collection, preparation, analysis, and reporting.;
Data Scientist
Professional who builds models, generates insights, and communicates results.;
Data Set
A delimited collection of related observations.;
Data Silo
A dataset (or system) isolated within a team or tool, inaccessible or hard to integrate.;
Data Smells
Context-independent indicators of potential data-quality problems.;
Data Stewardship
Accountable, day-to-day care of data assets.;
Data Strategy
A coherent plan that connects business goals to data acquisition, governance, and quality.;
Data Structure
Organizational format for data enabling specific operations.;
Data Transfer
The controlled movement of digital information between platforms.;
Data Trolling
The deliberate injection or manipulation of records to provoke or deceive.;
Data Visualization
Graphical representation of information to communicate patterns.;
Data Warehouse
Central repository of cleaned, structured data optimized for analysis and BI.;
Data Wrangling
Cleaning, restructuring, merging, and transforming data for analysis.;
Data Wrapping
Enhancing a product or service with analytics-driven data experiences.;
Database
Structured data storage organized for efficient retrieval.;
Database Management System (DBMS)
Software for creating, querying, and managing databases.;
Dataframe
Tabular data structure with labeled rows/columns.;
Dataset
A collection of related observations stored in a structured format.;