Data Analytics Exam 1

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/81

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:49 PM on 1/20/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

82 Terms

1
New cards

Population

The set of all items of interest in a statistical problem

Ex: Houses in Roanoke

2
New cards

Parameter

A descriptive measure of a population

Ex: Mean (average) appraised value of all houses

3
New cards

Sample

A set of items drawn from a population

Ex: 100 randomly selected houses

4
New cards

Statistic

A descriptive measure of a sample

Ex: Mean appraised value of selected homes

5
New cards

Statistical Inference

The process of making an estimate, prediction, or decision based upon sample data

6
New cards

Qualitative

Categorical

Ex: Brand names

7
New cards

Quantitative

Numbers

Ex: Number of bedrooms

8
New cards

Cross-Sectional

Observations in a sample that are collected at the same time

9
New cards

Time Series

Data that is collected at different points in time

10
New cards

N

Size of population

11
New cards

n

Size of sample

12
New cards

μ

Population mean

13
New cards

Sample mean

14
New cards

σ²

Population variance

15
New cards

σ

Population standard deviation

16
New cards

Sample variance

17
New cards

s

Sample standard deviation

18
New cards

Measures of Central Tendency

Mean, Median, Mode

19
New cards

Mean

Average

20
New cards

Median

The middle number

21
New cards

Percentiles

By 1’s??

Ex: Score in the 95th percentile? Your score is ≥ 95% of other scores in the group. This does not mean you made a 95%

22
New cards

Mode

Most frequently occurring number

CAN BE BIMODAL OR MULTIMODAL, DO NOT BE FOOLED

23
New cards

Measures of Dispersion

Range, Standard Deviation, and Variance

24
New cards

Dispersion is also known as

The spread or range of variability

25
New cards

Range

High minus low

26
New cards

A normal bell curve means

Mean = Median = Mode, tails are asymptotic, kurtosis is 0

27
New cards

Asymptotic

Close to the horizontal axis (x-axis) but never reach it

28
New cards

Standard Deviation

The standardized measure of distance from the mean, the positive square root of the variance

29
New cards

What percent of cases fall within one standard deviation from the mean?

68%

30
New cards

What percent of cases fall within two standard deviations from the mean?

97%

31
New cards

Shape of Data

Skewness and Kurtosis

32
New cards

Skewness

Measures the asymmetry of data

33
New cards

Positive Skew

Right skewed/longer right tail

34
New cards

Negative Skew

Left skewed/longer left tail

35
New cards

Kurtosis

Measures the peakedness of the distribution of data

36
New cards

High Kurtosis

Leptokurtic

Data has more outliers

37
New cards

Low Kurtosis

Platykurtic

Data has fewer outliers, values are more spread out evenly

38
New cards

Normal Kurtosis

Mesokurtic, data has normal distribution with a moderate number of outliers

39
New cards

What is the goal of graphing?

  1. Presentation of descriptive statistics

  2. Presentation of evidence

  3. Some people understand better with visual aids

  4. Provides a sense of the underlying data generating process (scatter-plots)

40
New cards

Statistical Studies

Observational, experimental

41
New cards

Observational

No attempt is made to control or influence the variables of interest

42
New cards

Experimental

Conducted under controlled conditions, provides more information compared to data obtained from existing sources/observational studies

43
New cards

Considerations on Data Acquisition

  1. Time requirement

  2. Cost of acquisition

  3. Data errors

44
New cards

Analytics

The scientific process of transforming data into insight for making better decisions

45
New cards

Descriptive Analytics

What has happened in the past

46
New cards

Predictive Analytics

Models constructed from past data to predict the future or to asses the impact of one variable on another

47
New cards

Prescriptive Analytics

Yield the best course of action

48
New cards

The three V’s

Volume, Velocity, Variety

49
New cards

Volume

The amount of available data

50
New cards

Velocity

The speed at which data is collected and processed

51
New cards

Variety

Different data types

52
New cards

Unethical Behavior in Statistical Study

  1. Improper sampling

  2. Inappropriate analysis of the data

  3. Development of misleading graphs

  4. Use of inappropriate summary statistics

  5. Biased interpretation of the statistical results

53
New cards

Quartile

Q1 = 25th percentile

Q2 = 50th percentile

Q3 = 75th percentile

54
New cards

Interquartile Range

Q3-Q1

55
New cards

Variance

Based on the difference between the value of each observation and the mean

56
New cards

Coefficient of Variation

Usually expressed as a percentage, measures how large the standard deviation is relative to the mean, (Standard deviation / mean) * 100%

57
New cards

Covariance

Measures how two variables change together in a linear way (there will be a direction indicating the relationship)

58
New cards

Correlation Coefficient

A measure of the relationship between x and y that is not affected by the units of measurement, CORRELATION IS NOT CAUSATION

59
New cards

The hypothesis test will be two tailed if

Null and alt are equal and inequal

60
New cards

The hypothesis test will be one tailed if

The null and alt are >< or equal to

61
New cards

Type I Error

Rejecting null when it is true, false positive

62
New cards

Level of significance

The probability of making a Type I Error

63
New cards

Type II Error

Accepting null when it is false, false negative

64
New cards

p-value

The probability of observing results as extreme as the sample, assuming null is true

65
New cards

p is less than or equal to a

Reject null

66
New cards

p is greater than a

Fail to reject null

67
New cards

Confidence Interval

A range of values that is used to estimate an unknown population parameter

68
New cards

t-test

Compares means of two sets of data and notes if there is an observable difference between the two

69
New cards

t is

an observed difference / standard error

70
New cards

p = 0.05

5% chance of making a Type I Error, 95% confidence

71
New cards

p = 0.01

1% chance of making a Type I Error, 99% confidence

72
New cards

The type of tail is determined by

Alternative hypothesis

73
New cards

t equation

t equals x bar minus mu divided by standard deviation divided by the square root of the sample number (n)

74
New cards

Degrees of Freedom (df)

Measures how much independent information is available to estimate variability

75
New cards

df is generally

n - 1

76
New cards

Random/Simple Random Sampling

Equal chances for all, unbiased, most widely used

77
New cards

Stratified Sampling

Ensures all groups are represented

78
New cards

Quota

Quick, non-random, guarantees proportions, based on convenience and faster/cheaper, non-probability

79
New cards

Purposive

Participants with specific traits or expertise, informative but subjective (non-probability)

80
New cards

Cluster

Efficient for large, spread out populations

81
New cards

Systematic

Regular interval selection, simple to apply, selecting every kth individual

82
New cards

Sampling

The process of selecting a subset of indivs or items from a pop to estimate characteristics of the whole, studying the entire population is usually impractical and costly