Comprehensive Data Analysis and Regression Model Concepts

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
GameKnowt Play
New
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/70

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

71 Terms

1
New cards

Regression

Explains the relationship between dependent and independent variables

2
New cards

Coefficients

Show the direction and magnitude of relationships

3
New cards

Model significance

F statistic less than 0.05 means model is significant

4
New cards

Independent variable significance

p-value less than 0.05

5
New cards

R-squared

Proportion of variation explained by the model

6
New cards

Adjusted R-squared

Accounts for sample size and number of variables

7
New cards

Optimization

Process to maximize or minimize an objective function

8
New cards

Decision variables

Unknown values the model determines

9
New cards

Objective function

Equation to minimize or maximize

10
New cards

Constraints

Limits such as demand, labor, or materials

11
New cards

Solver output

Shows optimal values and binding constraints

12
New cards

Data profiling

Investigates data quality and structure

13
New cards

Correctness

Right values are assigned

14
New cards

Validity

Acceptable values are entered

15
New cards

Consistency

Same characteristic represented the same way

16
New cards

Completeness

No missing instances or values

17
New cards

Data matching

Reconciles related records across tables

18
New cards

Issues

Nicknames, typos, reversed names

19
New cards

ETL tools

Provide advanced support for matching

20
New cards

Imputation

Replacing missing values with estimates

21
New cards

When applied

During ETL cleaning step

22
New cards

Purpose

Ensures data set remains usable

23
New cards

Fact tables

Store quantitative transaction data

24
New cards

Dimension tables

Provide descriptive context

25
New cards

Fact data

Can be aggregated and measured

26
New cards

Dimension data

Who, what, when, where details

27
New cards

Star schema

Simple structure with fact and dimension tables

28
New cards

Snowflake schema

Normalized dimensions into multiple tables

29
New cards

Star advantage

Easier and faster analysis

30
New cards

Snowflake advantage

Reduces redundancy

31
New cards

Outliers

Unusual data points compared to others

32
New cards

Dirty data

Missing, invalid, duplicate, inconsistent entries

33
New cards

Controls

Visualization, tests, comparisons to documents

34
New cards

Invalid data

Unacceptable entry (e.g., text in number field)

35
New cards

Incorrect data

Wrong but valid entry (e.g., wrong PO number)

36
New cards

Categorical data

Non-numeric labels (e.g., gender)

37
New cards

Ordinal data

Ranked values (e.g., satisfaction levels)

38
New cards

Interval data

Equal spacing, no true zero (e.g., temperature)

39
New cards

Ratio data

Equal spacing, true zero (e.g., revenue)

40
New cards

Relationships

Links between tables using keys

41
New cards

Primary key

Uniquely identifies rows in a table

42
New cards

Foreign key

References primary key in another table

43
New cards

Cardinality

Defines one-to-many or many-to-many links

44
New cards

Data issues

Missing instances, missing values, duplicates, outliers

45
New cards

Invalid issue

Wrong format of data

46
New cards

Inconsistent issue

Different formats for same value

47
New cards

Central location

Mean, median, mode

48
New cards

Dispersion

Variance and standard deviation

49
New cards

Symmetry

Mean, median, mode equal in distribution

50
New cards

Skewness

Asymmetry of data

51
New cards

Kurtosis

Peakedness or flatness of data

52
New cards

Inner join

Only matching rows from both tables

53
New cards

Left join

All rows from left table with matches from right

54
New cards

Right join

All rows from right table with matches from left

55
New cards

Full join

All rows from both with nulls for unmatched

56
New cards

Measured raw data

Directly observed values (e.g., price)

57
New cards

Non-measured raw data

Descriptive categories (e.g., product name)

58
New cards

Calculated data

Derived values (e.g., sales = qty × price)

59
New cards

Planning stage

Identify motivation, objectives, strategy

60
New cards

Analyze stage

Prepare, model, and explore data

61
New cards

Report stage

Interpret results and communicate findings

62
New cards

External motivation

From stakeholders or regulators

63
New cards

Internal motivation

To improve service or efficiency

64
New cards

Other motivations

Opportunities, problems, process improvement

65
New cards

Descriptive analysis

Describes what happened

66
New cards

Diagnostic analysis

Explains why it happened

67
New cards

Predictive analysis

Forecasts what will happen

68
New cards

Prescriptive analysis

Recommends what to do

69
New cards

Hypothesis testing

Compares null and alternative hypotheses

70
New cards

Type I error

Rejecting a true null (false positive)

71
New cards

Type II error

Failing to reject a false null (false negative)