Stat Econ Module 2

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/83

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 10:32 AM on 9/19/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

84 Terms

1
New cards

Two numerical ways of describing data

  1. Measures of Location

  2. Measures of Dispersion

2
New cards

Measures of Location (Central Tendency)

Pinpoint the center of a set of values.

Considering only measures of __________ can lead to erroneous conclusions; dispersion provides crucial additional context.

3
New cards

Measures of Dispersion (Variation/Spread)

Describe the spread or variability of the data.

4
New cards

Population

all the possible values or observations

5
New cards

Sample

a subset drawn from a population

6
New cards

Population Mean

The sum of all values divided by the number of values.

7
New cards

Population Mean Formula

μ = (ΣXᵢ) / N

where…

μ = __________

(ΣXᵢ) = sum of all X values in the population

N = number of values in the population

8
New cards

Parameter

any measurable characteristic of a population

ex: the mean of a population is an example of this

9
New cards

Parameter vs. Statistic

________ is any measurable characteristic of a population.

________ is any measurable characteristic of a sample.

10
New cards

Sample Mean Formula

x̄ = (ΣXᵢ) / n

where…

x̄ = ________

(ΣXᵢ) = sum of all x values in the sample

n = number of values in the sample

11
New cards

Statistic

Measure based on sample data

ex: mean of a sample

12
New cards

Properties of the Arithmetic Mean

  • A mean exists for every set of interval or ratio-level data.

  • Includes all values in computing the mean.

  • A mean is unique for any given data set.

  • The sum of deviations from the mean is always zero: Σ(X- x̄) = 0

13
New cards

Summation Properties

knowt flashcard image
14
New cards

Weakness of the Mean

Unduly affected by unusually large or small values (outliers), making it potentially unrepresentative.

15
New cards

Weighted Mean

Used when some values are more important than others. Each value is multiplied by its corresponding weight, summed, and then divided by the sum of the weights.

Ex: calculating for grades (where some are weighted more than others)

16
New cards

Weighted Mean Formula

(w₁x₁ + w₂x₂ + ... + wₙxₙ) / (w₁ + w₂ + ... + wₙ) = (Σwᵢxᵢ) / (Σwᵢ)

where…

the weight (w₁) of x₁ times x₁ plus the the weight (w₂) of x₂ times x₂… divided by the sum of the weights

<p>(w₁x₁ + w₂x₂ + ... + wₙxₙ) / (w₁ + w₂ + ... + wₙ) = (Σwᵢxᵢ) / (Σwᵢ)</p><p></p><p>where…</p><p>the weight (w₁) of x₁ times x₁ plus the the weight (w₂) of x₂ times x₂… divided by the sum of the weights</p>
17
New cards

Median

The midpoint of the values after they have been ordered from smallest to largest or largest to smallest.

Odd number of observations: The middle observation.

Even number of observations: The mean of the two middle observations. (May not be one of the original values).


Advantage: Unaffected by outliers

18
New cards

Mode

The value that appears most frequently in a data set.

Disadvantages

  • May not exist (no value appears more than once).

  • May have multiple modes (bimodal, multimodal).

  • Less frequently used than mean or median.

19
New cards

Mean = Median = Mode

for a symmetric mound-shape distribution

20
New cards

Skewed distribution

not symmetrical

21
New cards

positively skewed distribution

a data distribution where most values are concentrated on the left side of the graph; skewed to the right (looks like a p)

the arithmetic mean is the largest of the three measures because the mean is influenced more than the median/mode by a few extremely high values

the median is the next largest measure in a ___________

mode is the smallest

<p><span>a data distribution where most values are concentrated on the left side of the graph; skewed to the right (looks like a p)</span></p><p></p><p>the arithmetic mean is the largest of the three measures because the mean is influenced more than the median/mode by a few extremely high values</p><p>the median is the next largest measure in a ___________</p><p>mode is the smallest</p>
22
New cards

negatively skewed distribution

a data distribution where most values are concentrated on the right side of the graph (skewed to the left)

mean is the lowest of the three measures, influenced by a few extremely low observations

median is greater than mean

mode is the largest of the three

<p>a data distribution where most values are concentrated on the right side of the graph (skewed to the left)</p><p></p><p>mean is the lowest of the three measures, influenced by a few extremely low observations</p><p>median is greater than mean</p><p>mode is the largest of the three</p>
23
New cards

Mode is usually used for…

nominal-level data

24
New cards

Median is usually used for…

ordinal-level data

25
New cards

Mean is usually used for…

ratio-level data

26
New cards

Geometric Mean

useful for finding the average of percentages, ratios, indexes, or growth rates

ex: GDP which compound or build on each other

27
New cards

Geometric Mean Formula 1

n is the total number of terms (X) that are being multiplied

GM ≤ Arithmetic Mean

All data values must be positive

<p>n is the total number of terms (X) that are being multiplied</p><p>GM ≤ Arithmetic Mean</p><p>All data values must be positive</p>
28
New cards

Geometric Mean vs. Arithmetic Mean

_________ uses addition and division to find the average of a set of numbers, while _________ uses multiplication and roots to find the average, particularly for growth rates or ratios.

The AM is suitable for additive data, while the GM is better for multiplicative relationships, such as investment returns or population growth, and requires positive numbers.

29
New cards

Geometric Mean Formula 2

to find an average percent increase over a period of time

ex: financial economics

<p>to find an average percent increase over a period of time</p><p>ex: financial economics</p>
30
New cards

Why study dispersion?

  • Measures of location alone do not describe the spread of data.

  • Allows for comparison of spread between two or more distributions.

ex: tour guide said river is avg. 3ft, but dispersion can say oh you can’t walk cuz there’s a 5 ft. section

31
New cards

Small Dispersion Meaning

data are clustered around the mean, making the mean representative.

32
New cards

Large Dispersion meaning

means the mean is less reliable as a representation of the data.

33
New cards

Range

Largest value - smallest value

34
New cards

Mean Deviation

Measures the average distance of all the values from the mean.

The arithmetic mean of the absolute values of the deviations from the arithmetic mean.

x = the value of each observation

x̄ = arithmetic mean of the values

n = number of observations in the sample

| | = absolute value

<p>Measures the average distance of all the values from the mean. </p><p>The arithmetic mean of the absolute values of the deviations from the arithmetic mean.</p><p></p><p>x = the value of each observation</p><p>x̄ = arithmetic mean of the values</p><p>n = number of observations in the sample</p><p>| | = absolute value</p>
35
New cards

Why absolute values in mean deviation

Without them, positive and negative deviations would cancel, resulting in a zero (useless) statistic.

36
New cards

Advantages of Mean Deviation

Uses all values, easy to understand.

37
New cards

Disadvantages of Mean Deviation

Use of absolute values makes it difficult to work with mathematically, less frequently used than standard deviation.

38
New cards

Variance

The arithmetic mean of the squared deviations from the mean.

  • population variance (parameter)

  • sample variance (statistic)

39
New cards

Standard Deviation

The square root of the variance. It is in the same units as the original data.

  • Population Standard Deviation (parameter)

  • Sample Standard Deviation (statistic)

40
New cards

Population Variance Formula

𝝈𝟐 = σ(𝑿 − 𝝁)𝟐 / 𝑵

<p><span>𝝈𝟐 = σ(𝑿 − 𝝁)𝟐 / 𝑵</span></p>
41
New cards

Small variance means

populations whose values are near the mean

42
New cards

large variance means

population whose values are dispersed from the mean

43
New cards

Advantages of the variance

Uses all values, squaring deviations prevents cancellation (like absolute values but mathematically preferred). Always non-negative.

44
New cards

Disadvantages of the variance

Units are squared, making it difficult to interpret directly.

45
New cards

Sample Variance Formula

𝒔𝟐 = σ(𝑿 − ഥ𝑿)𝟐 / (𝒏 − 𝟏)

<p><span>𝒔𝟐 = σ(𝑿 − ഥ𝑿)𝟐 / (𝒏 − 𝟏)</span></p>
46
New cards

Sample standard deviation formula

𝒔 = √[σ(𝑿 − ഥ𝑿)𝟐 / (𝒏 − 𝟏)]

<p><span>𝒔 = √[σ(𝑿 − ഥ𝑿)𝟐 / (𝒏 − 𝟏)]</span></p>
47
New cards

Interpretation and Uses of Standard Deviation

Commonly used to compare the spread in two or more sets of observations.

Ex: high standard deviation in index 500 funds means it’s makulit to show high risk and lower returns

low standard deviation means it’s safe and smaller, but less risky returns

48
New cards

Chebyshev's Theorem

determines the minimum proportion of the values that lie within a specified number of standard deviations of the mean

For any data set (any shape), the proportion of values within k standard deviations of the mean is at least 1 - (1/k²), where k is any constant greater than 1

Ex: 2 standard deviations away (plug it in to formula), means 75% of values lie 2 standard deviations away

49
New cards

Empirical Rule (Normal Rule)

For a symmetrical, bell-shaped distribution:

  • Approximately 68% of observations are within ±1 standard deviation of the mean.

  • Approximately 95% of observations are within ±2 standard deviations of the mean.

  • Practically all (99.7%) observations are within ±3 standard deviations of the mean.

50
New cards

Population Standard Deviation

𝝈 = √[σ(𝑿 − 𝝁)𝟐 / 𝑵]

<p><span>𝝈 = √[σ(𝑿 − 𝝁)𝟐 / 𝑵]</span></p>
51
New cards

why use n-1

the denominator provides the appropriate correction" because using 'n' "tends to underestimate the population variance.

52
New cards

why discuss caculation of statistical descriptions from grouped data?

published data are often available in the form of a frequency distribution (ungrouped data is hard to get sometimes)

53
New cards

Mean of Grouped Data

Population: 𝝁 = (∑𝑿𝐟)/N

Sample: = (∑𝑿𝐟)/n

∑𝑿𝐟 is the sum of the products obtained by multiplying each class mark by the corresponding class frequency

<p>Population: 𝝁 = (∑𝑿𝐟)/N</p><p>Sample: <span>x̄</span> = (∑𝑿𝐟)/n</p><p></p><p>∑𝑿𝐟 is the sum of the products obtained by multiplying each class mark by the corresponding class frequency</p>
54
New cards

Median for Grouped Data

Md = median for grouped data

Lm = lower class boundary of the median class

c = class interval or class width

n = sample size

Fm-1 = cumulative of interval immediately preceding the median class

fm = frequency of the median class

<p>Md = median for grouped data</p><p>Lm = lower class boundary of the median class</p><p>c = class interval or class width</p><p>n = sample size</p><p>Fm-1 = cumulative of interval immediately preceding the median class</p><p>fm = frequency of the median class</p>
55
New cards

Mode for Grouped Data

M₀ = mode for grouped data

l₁ = lower class limit of the modal class

Δ1 = difference between the frequency of the modal class and the frequency of the preceding class (ignore the sign and just take the absolute value).

Δ2 = difference between the frequency of the model class and the frequency of the succeeding class (ignore the sign and just take the absolute value).

c = class interval or class width

<p>M₀ = mode for grouped data</p><p>l₁ = lower class limit of the modal class</p><p><span>Δ1 = difference between the frequency of the modal class and the frequency of the preceding class (ignore the sign and just take the absolute value).</span></p><p>Δ2 = difference between the frequency of the model class and the frequency of the succeeding class (ignore the sign and just take the absolute value).</p><p>c = class interval or class width</p>
56
New cards

Grouped Data Population Variance Formula

knowt flashcard image
57
New cards

Grouped Data Population Standard Deviation Formula

knowt flashcard image
58
New cards

Grouped Data Sample Variance Formula

knowt flashcard image
59
New cards

Grouped Data Sample Standard Deviation Formula

knowt flashcard image
60
New cards

To get from “ungrouped” to “grouped”…

substitute ∑𝑿 by ∑𝑿𝐟 and ∑𝑿² by ∑𝑿²𝐟

61
New cards

Box Plot

chart that is a graphical display, based on quartiles, that helps us picture a set of data

62
New cards

Statistics needed for a box plot

  1. minimum value

  2. first quartile

  3. median

  4. third quartile

  5. maximum value

63
New cards

Outlier

  • value that is inconsistent with the rest of the data, you need raw data

Outlier > Q + 1.5(Q - Q)

Outlier < Q₃ -1.5(Q₃ - Q₁)

64
New cards

skewness

lack of symmetry in a set of values

65
New cards

Symmetric

mean = median

  • data values evenly spread around these values

  • data values below mean and median are mirror image of those above 

<p>mean = median</p><ul><li><p>data values evenly spread around these values</p></li><li><p>data values below mean and median are mirror image of those above&nbsp;</p></li></ul><p></p>
66
New cards

Positively Skewed

set of values is skewed to the right if there is a single peak and the values extend much further to the right of the peak than to the left

  • mean > median

<p>set of values is skewed to the right if there is a single peak and the values extend much further to the right of the peak than to the left</p><ul><li><p>mean &gt; median</p></li></ul><p></p>
67
New cards

Negatively skewed

there is a single peak, but the observation extend further to the left than to the right

  • mean < median

<p>there is a single peak, but the observation extend further to the left than to the right</p><ul><li><p>mean &lt; median</p></li></ul><p></p>
68
New cards

Bimodal

two or more peaks

<p>two or more peaks</p>
69
New cards

Pearson’s Coefficient of Skewness

based on the difference between the mean and median

  • ranges from -3 to 3

<p>based on the difference between the mean and median</p><ul><li><p>ranges from -3 to 3</p></li></ul><p></p>
70
New cards

value near -3 (Pearson)

considerable negative skewness

ex: -2.57

71
New cards

value near 3 (Pearson)

considerable positive skewness

72
New cards

value of 0 (Pearson)

mean = median

no skewness, symmetrical

73
New cards
74
New cards

Software Coefficient of Skewness

shows the difference between each value and the mean, divided by standard deviation

  • if difference is (+), the particular value is larger than the mean

  • if difference is (-), is it smaller than the mean

  • when cubed, it shows the information on the direction of the difference

<p>shows the difference between each value and the mean, divided by standard deviation</p><ul><li><p>if difference is (+), the particular value is larger than the mean</p></li><li><p>if difference is (-), is it smaller than the mean</p></li><li><p>when cubed, it shows the information on the direction of the difference</p></li></ul><p></p>
75
New cards

Standardization

reports the difference between each value and the mean in units of the standard deviation

76
New cards

symmetric (Software Coefficient)

if standardized values are cubed and sum of lal the values would result to NEAR ZERO

77
New cards

positive skewness (Software Coefficient)

if there are several large values, clearly separate from the others, the sum of the cubed differences would be a LARGE POSITIVE VALUE

78
New cards

negative skewness (Software Coefficient)

several values much smaller will result in a NEGATIVE CUBED SUM

79
New cards

Unvariate Data

techniques to summarize the distribution of a single variable

80
New cards

Bivariate Data

two variables are measured for each individual or observation in the population or sample

  • used often by data analysts

81
New cards

Scatter Diagram

graphical technique to show the relationship between variables

  • variables in the x and y axis

<p>graphical technique to show the relationship between variables</p><ul><li><p>variables in the x and y axis</p></li></ul><p></p>
82
New cards

Positively Related (Scatter Diagram)

points move from lower left to upper right

83
New cards

Negatively Related (Scatter Diagram)

from upper left to lower right

84
New cards

Contingency Table

table used to classify observations according to two identifiable characteristics

  • for studying the relationship between two variables when one or both are nominal or ordinal scale

<p>table used to classify&nbsp;observations according to two identifiable characteristics</p><ul><li><p>for studying the relationship between two variables when one or both are nominal or ordinal scale</p></li></ul><p></p>