data 115 (a-z)

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/208

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

209 Terms

1
New cards

Accuracy

The closeness of a measured or predicted value to the true value. High accuracy means minimal systematic error.;

2
New cards

Accuracy Score

An evaluation metric showing the ratio of correct predictions to total predictions in classification tasks.;

3
New cards

Activation Function

A nonlinear function in neural networks that transforms a neuron's input into its output and determines whether it ‘fires’.;

4
New cards

Algorithm

A sequence of repeatable, often mathematical steps that transforms inputs to outputs; in ML, takes data + hyperparameters and yields predictions.;

5
New cards

Analysis of Variance (ANOVA)

A statistical method to test whether three or more groups have the same mean, revealing whether observed differences are significant.;

6
New cards

Apache Spark

An open-source parallel processing framework for big data; distributes data and computation across cluster nodes for speed and scale.;

7
New cards

API (Application Programming Interface)

A software bridge allowing programs/services to communicate (e.g., using a web API to fetch data or serve models).;

8
New cards

Artificial Intelligence (AI)

A broad field aiming to create systems that perform tasks normally requiring human intelligence, such as language understanding or image recognition.;

9
New cards

Artificial Neural Networks (ANN)

Neural-network ML models with input, hidden, and output layers; the foundation of deep learning.;

10
New cards

Backpropagation

The process by which neural networks adjust internal weights, moving backwards through the network to reduce prediction errors.;

11
New cards

Bayes' Theorem

A formula for conditional probability: P(A

12
New cards

B) ∝ P(B

13
New cards

A)·P(A), used widely in inference and classifiers.;

14
New cards

Bayesian Network

A probabilistic graphical model whose nodes are variables and edges encode conditional dependencies, supporting inference under uncertainty.;

15
New cards

Bias (Model Bias)

Systematic error from overly simple assumptions leading to underfitting; also refers to societal/algorithmic unfairness in AI systems.;

16
New cards

Bias-Variance Tradeoff

Balancing underfitting (high bias) and overfitting (high variance) to minimize total prediction error.;

17
New cards

Big Data

Datasets so large or complex that traditional data-processing methods are inadequate; often characterized by volume, velocity, variety (and often veracity and value).;

18
New cards

Binary Classification

A supervised ML task that predicts one of two possible outcomes (e.g., spam vs. not spam).;

19
New cards

Binomial Distribution

Discrete distribution for a fixed number of independent trials with constant success probability and two outcomes.;

20
New cards

Boxplot

A chart showing data distribution via quartiles, highlighting median, variability, and potential outliers.;

21
New cards

Business Analyst

Professional who connects data insights to actions for profitability/efficiency; often uses SQL and BI tools.;

22
New cards

Business Analytics (BA)

Using data (descriptive and predictive) to find insights and support business decisions.;

23
New cards

Business Intelligence (BI)

Descriptive analytics and reporting for monitoring and understanding business performance, often via dashboards.;

24
New cards

Categorical Variable

Variable with limited categories (no inherent order); e.g., color, marital status.;

25
New cards

Central Tendency

Measures (mean, median, mode) describing the typical value of a dataset.;

26
New cards

Classification

Supervised learning task predicting categorical labels from features.;

27
New cards

Clustering

An unsupervised ML method that groups similar data points without pre-labeled categories.;

28
New cards

Computer Science

Study of computation, algorithms, data structures, software/hardware systems, and applications.;

29
New cards

Computer Vision

Field enabling computers to interpret images/video (e.g., recognition, detection).;

30
New cards

Concept Drift

The relationship between inputs and target changes. Both degrade model performance and require monitoring/retuning.;

31
New cards

Confidence Interval (CI)

A range of plausible values for a population parameter derived from sample data.;

32
New cards

Confusion Matrix

2×2 (or multi-class) table summarizing classification predictions vs. actuals (TP, FP, TN, FN).;

33
New cards

Continuous Variable

Variable taking any value in a range (e.g., height, weight).;

34
New cards

Correlation

Strength/direction of linear relationship between variables; measured by coefficients like Pearson’s r.;

35
New cards

Cost Function

Objective to minimize during training (e.g., MSE, cross-entropy), measuring prediction error.;

36
New cards

Covariance

Measure of how two variables vary together; used in correlation calculations.;

37
New cards

Cross-Validation

Resampling strategy (e.g., k-fold) to estimate model performance on unseen data.;

38
New cards

CSV (Comma-Separated Values)

A simple text file format where each line is a record and fields are separated by commas.;

39
New cards

Dashboard

Interactive visual interface showing KPIs and metrics for monitoring/decision-making.;

40
New cards

Data Analysis (DA)

Cleaning, transforming, exploring, and visualizing data to extract insights.;

41
New cards

Data Analyst

Professional who analyzes data and reports insights using coding and BI tools.;

42
New cards

Data Anonymization

Techniques that irreversibly remove or generalize identifiers so individuals cannot be re-identified (e.g., k-anonymity, differential privacy). Stronger than pseudonymization.;

43
New cards

Data Augmentation (ML)

Creating additional training examples by transforming existing ones (e.g., rotating/cropping images, noise injection, text paraphrase) to boost model robustness without new data collection.;

44
New cards

Data Catalog

A searchable inventory of datasets with rich metadata (owners, lineage, sensitivity, freshness, quality scores) that helps people discover, trust, and reuse data.;

45
New cards

Data Cleaning

Correcting or removing inaccurate records, handling missing values, and formatting data for analysis.;

46
New cards

Data Creation

The act of generating new data—by sensing, logging, surveying, simulating, or deriving via computation (e.g., feature engineering).;

47
New cards

Data Consumer

Stakeholder who uses data insights to make decisions; collaborates with data teams.;

48
New cards

Data Decay

The gradual deterioration of data quality over time as facts change, formats drift, links rot, and records go stale.;

49
New cards

Data Dictionary

A definitive reference for each field/column (name, meaning, datatype, units, allowed values, calculation rules).;

50
New cards

Data Drift

Input feature distributions change over time.;

51
New cards

Data-Driven

An operating posture in which decisions are justified by measured evidence rather than intuition alone.;

52
New cards

Data Engineer

Builds/maintains data infrastructure and pipelines delivering clean, usable data.;

53
New cards

Data Engineering (DE)

Acquiring, organizing, and scaling data access via pipelines, storage, and transformation.;

54
New cards

Data Enrichment

Enhancing existing data with additional attributes/context for more value.;

55
New cards

Data Envelope

A container pattern that wraps a payload with headers/metadata for routing, integrity, and security.;

56
New cards

Data Ethics

Principles guiding responsible data use, including privacy, fairness, transparency, and accountability.;

57
New cards

Data Fabrication

The research-ethics term for making up data or results.;

58
New cards

Data Forecasting

Using time-series/statistical or ML models to project future values from historical data.;

59
New cards

Data Governance

Policies/roles/processes ensuring data quality, availability, integrity, and security across an organization.;

60
New cards

Data Hierarchy (DIKW)

The classic ladder: Data → Information → Knowledge → Wisdom.;

61
New cards

Data Hoarding

Accumulating data without purpose, documentation, or quality controls.;

62
New cards

Data Hygiene

The routine practices that keep data trustworthy and analysis-ready.;

63
New cards

Data Imbalance

A skewed target distribution where one class dominates (e.g., 99% non-fraud).;

64
New cards

Data Journalism

Using quantitative analysis to inform storytelling in journalism.;

65
New cards

Data Lake

Central storage of raw, unstructured/structured data from many sources for future use.;

66
New cards

Data Leakage

When information from outside the proper training window leaks into features.;

67
New cards

Data Lineage

End-to-end record of where data came from and how it changed.;

68
New cards

Data Literacy

Ability to read, analyze, communicate, and reason with data.;

69
New cards

Data Lookup

Retrieving values by key from a reference source (e.g., join to a code table, Excel VLOOKUP).;

70
New cards

Data Mart

A subject-specific slice of a data warehouse.;

71
New cards

Data Mining

Discovering patterns/relationships in large datasets.;

72
New cards

Data Modeling

Representing data structures/relationships or building predictive models.;

73
New cards

Data Observability

Continuous monitoring of data systems.;

74
New cards

Data Pipeline

Automated flow of data through extract/transform/load steps.;

75
New cards

Data Provenance

Documented origin and acquisition context.;

76
New cards

Data Pseudonymization

Replacing direct identifiers with tokens while keeping a re-link key separate.;

77
New cards

Data Quality

Fitness of data for intended use.;

78
New cards

Data Representation

How values are encoded and structured.;

79
New cards

Data Residency

The geographic/legal location where data are stored.;

80
New cards

Data Retention

Policies defining how long data are kept, why, and how they are securely deleted.;

81
New cards

Data Rights

Legal and normative rights individuals have over personal data.;

82
New cards

Data Science (DS)

Interdisciplinary field applying methods to collect, manage, analyze, and communicate from data.;

83
New cards

Data Science Cycle

The iterative process of problem definition, data collection, preparation, analysis, and reporting.;

84
New cards

Data Scientist

Professional who builds models, generates insights, and communicates results.;

85
New cards

Data Set

A delimited collection of related observations.;

86
New cards

Data Silo

A dataset (or system) isolated within a team or tool, inaccessible or hard to integrate.;

87
New cards

Data Smells

Context-independent indicators of potential data-quality problems.;

88
New cards

Data Stewardship

Accountable, day-to-day care of data assets.;

89
New cards

Data Strategy

A coherent plan that connects business goals to data acquisition, governance, and quality.;

90
New cards

Data Structure

Organizational format for data enabling specific operations.;

91
New cards

Data Transfer

The controlled movement of digital information between platforms.;

92
New cards

Data Trolling

The deliberate injection or manipulation of records to provoke or deceive.;

93
New cards

Data Visualization

Graphical representation of information to communicate patterns.;

94
New cards

Data Warehouse

Central repository of cleaned, structured data optimized for analysis and BI.;

95
New cards

Data Wrangling

Cleaning, restructuring, merging, and transforming data for analysis.;

96
New cards

Data Wrapping

Enhancing a product or service with analytics-driven data experiences.;

97
New cards

Database

Structured data storage organized for efficient retrieval.;

98
New cards

Database Management System (DBMS)

Software for creating, querying, and managing databases.;

99
New cards

Dataframe

Tabular data structure with labeled rows/columns.;

100
New cards

Dataset

A collection of related observations stored in a structured format.;