Data Science: Problem Formulation and Data Concepts (Vocabulary)

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/39

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key terms from the lecture notes on problem formulation and data concepts.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

Problem statement

A clear, concise, and measurable description of the problem to be solved that guides analysis and aligns the team.

2
New cards

Context

Background information that outlines the situation surrounding the problem.

3
New cards

Specificity

The clarity and lack of vagueness in stating the problem.

4
New cards

Objectives (Measurable)

Stated goals with specific metrics or outcomes to indicate success.

5
New cards

Data

Facts, figures, and content collected; raw data become information after processing.

6
New cards

Information

Data that have been organized, analyzed, and processed to be meaningful.

7
New cards

Knowledge

Information plus context, experience, and intuition used to derive insights.

8
New cards

Value

The benefit gained when knowledge is translated into actionable decisions.

9
New cards

Population

The entire set of objects (people, places, things, etc.) of interest to study.

10
New cards

Sample

A subset of the population used for analysis when collecting data on the whole population is impractical.

11
New cards

Parameter

A summary measure that characterizes the population (usually unknown).

12
New cards

Statistic

A summary measure calculated from a sample.

13
New cards

Statistical Inference

Drawing conclusions about a population from sample data, assuming unbiased sampling.

14
New cards

Big Data

Data that are too large or complex for traditional processing methods.

15
New cards

Volume

The enormous amount of data in a dataset or datasets.

16
New cards

Velocity

The speed at which data are generated and must be processed.

17
New cards

Variety

The different forms and sources of data (structured, unstructured, etc.).

18
New cards

Veracity

The credibility and quality of data.

19
New cards

Structured Data

Highly organized data in rows and columns, easy to analyze.

20
New cards

Unstructured Data

Data not organized in fixed schema, harder to analyze directly.

21
New cards

Tidy data

A standardized format where each variable is a column, each observation a row, and each cell contains only one value.

22
New cards

Observational Unit

The individual subject (row) in a dataset.

23
New cards

Variable

An attribute or feature describing the observational unit; stored in columns.

24
New cards

Observation

The individual data values in each cell collected for each observational unit.

25
New cards

Qualitative data (categorical)

Data that describe categories or qualities; labels or codes without numeric meaning.

26
New cards

Nominal

Categorical data with no natural order.

27
New cards

Ordinal

Categorical data with a natural order.

28
New cards

Identifier (special case nominal)

A nominal variable used only to identify units, with no analytical meaning.

29
New cards

Numerical (Quantitative) data

Measurements or counts expressed as numbers.

30
New cards

Discrete

Numerical data with finite values, typically counts; not subdividable meaningfully.

31
New cards

Continuous

Numerical data that can take any value within a range; often measurable values.

32
New cards

Ratio scale

A numerical scale with a true zero and meaningful ratios (e.g., profits).

33
New cards

Interval scale

A numerical scale with meaningful differences but no true zero (e.g., temperature in Celsius).

34
New cards

Cross-Sectional Data

Observations of many subjects at the same point in time across multiple variables.

35
New cards

Time Series Data

Observations of a single subject across multiple time points.

36
New cards

Data Dictionary

Metadata table describing variables: names, descriptions, types, units, and scales.

37
New cards

Target/Dependent/Response Variable

The outcome to be explained or predicted (e.g., GPA).

38
New cards

Independent/Explanatory/Predictor Variables

Variables used to explain or predict the target.

39
New cards

Data Cleaning

Processes to prepare raw data for analysis by fixing errors and handling issues.

40
New cards

Data-Information-Knowledge-Value chain

A progression: data → information → knowledge → value, leading to actionable decisions.