1/39
Vocabulary flashcards covering key terms from the lecture notes on problem formulation and data concepts.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Problem statement
A clear, concise, and measurable description of the problem to be solved that guides analysis and aligns the team.
Context
Background information that outlines the situation surrounding the problem.
Specificity
The clarity and lack of vagueness in stating the problem.
Objectives (Measurable)
Stated goals with specific metrics or outcomes to indicate success.
Data
Facts, figures, and content collected; raw data become information after processing.
Information
Data that have been organized, analyzed, and processed to be meaningful.
Knowledge
Information plus context, experience, and intuition used to derive insights.
Value
The benefit gained when knowledge is translated into actionable decisions.
Population
The entire set of objects (people, places, things, etc.) of interest to study.
Sample
A subset of the population used for analysis when collecting data on the whole population is impractical.
Parameter
A summary measure that characterizes the population (usually unknown).
Statistic
A summary measure calculated from a sample.
Statistical Inference
Drawing conclusions about a population from sample data, assuming unbiased sampling.
Big Data
Data that are too large or complex for traditional processing methods.
Volume
The enormous amount of data in a dataset or datasets.
Velocity
The speed at which data are generated and must be processed.
Variety
The different forms and sources of data (structured, unstructured, etc.).
Veracity
The credibility and quality of data.
Structured Data
Highly organized data in rows and columns, easy to analyze.
Unstructured Data
Data not organized in fixed schema, harder to analyze directly.
Tidy data
A standardized format where each variable is a column, each observation a row, and each cell contains only one value.
Observational Unit
The individual subject (row) in a dataset.
Variable
An attribute or feature describing the observational unit; stored in columns.
Observation
The individual data values in each cell collected for each observational unit.
Qualitative data (categorical)
Data that describe categories or qualities; labels or codes without numeric meaning.
Nominal
Categorical data with no natural order.
Ordinal
Categorical data with a natural order.
Identifier (special case nominal)
A nominal variable used only to identify units, with no analytical meaning.
Numerical (Quantitative) data
Measurements or counts expressed as numbers.
Discrete
Numerical data with finite values, typically counts; not subdividable meaningfully.
Continuous
Numerical data that can take any value within a range; often measurable values.
Ratio scale
A numerical scale with a true zero and meaningful ratios (e.g., profits).
Interval scale
A numerical scale with meaningful differences but no true zero (e.g., temperature in Celsius).
Cross-Sectional Data
Observations of many subjects at the same point in time across multiple variables.
Time Series Data
Observations of a single subject across multiple time points.
Data Dictionary
Metadata table describing variables: names, descriptions, types, units, and scales.
Target/Dependent/Response Variable
The outcome to be explained or predicted (e.g., GPA).
Independent/Explanatory/Predictor Variables
Variables used to explain or predict the target.
Data Cleaning
Processes to prepare raw data for analysis by fixing errors and handling issues.
Data-Information-Knowledge-Value chain
A progression: data → information → knowledge → value, leading to actionable decisions.