Res-Econ 212 Exam #1

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/105

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 8:09 PM on 3/27/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

106 Terms

1
New cards

Descriptive Statistics

the method or procedure of

  • gathering

  • organizing and

  • summarizing data visually and numerically

  • interpreting data using visual displays and numerical statistics to characterize key features of the data

2
New cards

Inferential statistics

the method or procedure of using statistics computed from the sample data to draw conclusions about the population from which the data are drawn

3
New cards

population

the entire set of possible cases; an entire group of interest you want to draw conclusions about. it can be a group of individuals, objects, events, organizations, etc.

  • number in entire population denoted by “N”

4
New cards

sample

a portion or part of the population of interest — a smaller subset selected from the group to represent it

  • sample size denoted by “n”

5
New cards

statistic

a descriptive measure concerning a sample (e.g., sample mean). use roman letters x bar

6
New cards

parameter

a descriptive measure concerning a population (e.g. population mean) use Greek letters, μ

7
New cards

bias

  • sample statistic is not representative of population parameter

  • sources:

    • selection bias

    • confirmation bias

    • data manipulation/funding bias

    • outliers

    • nonresponses or incorrect response

    • omitted variable bias

    • leading questions in survey

8
New cards

sampling methods

  • not probability based

    • convenience sampling methods

  • probability based

    • simple random sampling method

    • systematic random sampling method

    • stratified sampling methods

    • cluster sampling methods

9
New cards

simple random sampling

select items from a population such that every possible sample of a specified size has an equal chance of being selected

  • our assumed sample design, unless specified otherwise

10
New cards

systematic random sampling

  1. get a list of population items (N)

  2. select a sample size (n)

  3. determine the interval frequency: select every kth item on the population list

    1. k = population size (N)/the desired sample size (n)

  4. select a starting point between 1 and k

11
New cards

SystemicRS advantaged and disadvantages

  • easier to conduct

  • not every possible sample has an equal chance of being selected

12
New cards

cluster sampling

  • divide the population into internally heterogenous and externally homogenous subpopulations known as clusters

  • the clusters are externally homogenous as they appear to be grouped together by a shared characteristic/criteria

  • choose any one section

  • the clusters are internally heterogenous because the subpopulations within the clusters have different compositions

13
New cards

stratified sampling

divide the entire population into groups with different characteristics called strata. random samples are then selected from each stratum to reflect the population characteristics

  • need to confirm that the stratum of the population is in proportion to the stratum in the sample

  • form strata with similar characteristics so the heterogeneity across strata should correspond to the heterogeneity in the population

14
New cards

convenience sampling methods

collect market research data from a conveniently available pool of respondents

can be biased

15
New cards

variable

something that can take on different values. it “varies”. characteristics of an individual

16
New cards

observations

a single member of the collection of items

17
New cards

data set

a collection of all the values of all variables of all variables for all observations we have chosen to observe

18
New cards

data characterization

categorical or qualitative (nominal & ordinal)

  • values that are not numerical but can be put into distinct groups — gender, race, grades etc.

numerical or quantitative (discrete & continuous)

  • variables where the measurement or numbers have a numerical meaning (height, weight, # of ppl, temp, etc.)

19
New cards

nominal

quality not rank (student/not student, color of hair or eyes, etc.)

20
New cards

ordinal

rank (grades, level of education, satisfaction level, etc.)

21
New cards

discrete

variable with distinct values and countable (# of students, # of employees, etc.)

22
New cards

continuous

variables which are measurable but not countable. can have any value within an interval (temp, weight, etc.)

23
New cards

interval data

has meaningful distances but do not have true zeros or fixed beginning (temp. in f or c, grade levels in schools, etc.)

24
New cards

ratio data

have defined zeros. you can look at the ration of two variables, and it makes sense (ex. weight, income, market share, sales, etc.)

25
New cards

time series data

one thing of interest (one variable) across different points in time

  • ex. stock price of apple over the course of a year or 20 yrs. height of a child over the years

26
New cards
27
New cards

cross section data

one point in time with several variables or factors or things of interest

  • ex. height of children in a preschool today

28
New cards

panel data

combination of time series and cross sectional data

  • ex. height of children in a specific preschool every week in 2023

29
New cards

frequency

the number of times an event occurred in the experiment or study

  • a method of organizing data to observe or gain more insight about the properties of the data

30
New cards

relative frequency

the number of times an event occurs in an experiment or study divided by the number of total trials conducted. tells you how often something happens compare to all outcomes

31
New cards

data visualization used for…

  • exploring data structure (central tendency and spread/dispersion)

  • detecting outliers and unusual groups

  • identifying trends

  • spotting local patterns

32
New cards

data visualization guidelines

concentrate on the most important data to communicate the research question

  • include as much relevant info possible, exclude irrelevant or unnecessary details

  • the more info a chart is able to convert without increasing complexity, the better

  • not all charts are suitable for all types of data

  • the number and typed of variables should guide the visualization’s format

33
New cards

general rule for which charts to use

  • categorical; few; → pie chart

  • categorical; many; → bar chart

  • quantitative discrete; few → pie chart

  • quantitative discrete; moderate → bar chart

  • continuous or discrete; many; histogram

34
New cards

cumulative relative frequency

number of observations falling in a given class in a frequency table plus all the observations falling in earlier classes, divided by the total number of observations

35
New cards

problems with data visualization

  • using wrong chart

  • too many variables in one graph

  • 3d graphs on a 2d space

  • manipulating the y-axis

  • using two y-axis

  • cherry picking

  • consistency in scale, especially for pictograms

  • choice of color

36
New cards

pie charts

good for comparing different parts of a whole but not good for comparing different data sets

not good when there are too many variables

37
New cards

what must you first construct for pie charts, histograms, and bar charts?

frequency table and relative frequency table

38
New cards

central tendency

where is the center of the distribution? where would I balance the distribution on my finger such that it wouldn’t fall?

39
New cards

bar chart

a visual display of the frequency or the relative frequency table constructed using discrete data

  • bars have same width and do not touch; labelled properly and concisely with clear title

  • used to display distribution of a qualitative variable

40
New cards

histogram

a visual display of the frequency or the relative frequency table constructed using continuous data

  • bars have same width, touch, and chart is properly labelled with clear title

  • effective way to represent distributions

  • used to display distribution of a quantitative variable

41
New cards

dispersion

an arrangement of values of a values showing their observed or theoretical frequency of occurrence

  • measures the level of variation in the data

  • can tell us central tendency

  • how far do the tails spread?

42
New cards
<p>distribution shapes </p>

distribution shapes

knowt flashcard image
43
New cards

data characteristics

center

  • what seem to be the typical middle of the data?

variability

  • how much dispersion/spread is there in the data? any unusual values?

shape

  • are the data values distributed symmetrically? skewed? bimodal?

44
New cards

used to measure center of data?

mean, median and mode

  • comparing the three → shape of distribution

45
New cards

mean

the sum of data values divided by the number of data items

  • most commonly used, includes all data

  • influenced by extreme data points (outliers)

for population called mean or expected value or average

for sample, called sample mean or sample average

46
New cards

median

the midpoint of a set of sorted data

  • used in presence of outliers; does not include all data

separates the upper and lower half of sorted observations

  • denotes the 50th percentile

determining position of median

  • (number of observations + 1) ÷ 2

47
New cards

mode

the most frequently occurring data value

  • can be used for categorical data. can have multiple modes. not helpful for continuous data

  • frequency and relative frequency are the highest among all values

48
New cards

skewness if mean = median = mode?

symmetrical

<p>symmetrical </p>
49
New cards

skewness if mean < median < mode?

skewed left

<p>skewed left </p>
50
New cards

skewness if mode < median < mean?

skewed right

<p>skewed right </p>
51
New cards

skewness

a measure of the asymmetry of a distribution

52
New cards

outlier

a value that is higher or lower than the rest of the data values in an extreme way

effects:

  • mean is affected bc it is calculated into the mean

  • median unaffected since values do not matter when locating

  • mode typically unaffected since it would not be the most frequent observation

53
New cards

variance

average of the squared distances between the data values and their mean

  • looking at dispersion around a measure of central tendency — the mean

54
New cards

formula for computing variance

  • σ2 = population variance

  • S2 = sample variance

  • N = total number of observation in the population

  • n is the number of data points in the sample

  • xi = each individual data point

  • μ = the population mean

  • x bar = sample mean

to calculate, subtract the mean from each data point, square the result, sum all squared values, and

  • for population divide by total number of observations

  • for sample divide by n - 1

<ul><li><p><span>σ<sup>2 </sup>= population variance </span></p></li><li><p>S<sup>2 </sup>= sample variance</p></li><li><p><span>N = total number of observation in the population</span></p></li><li><p><span>n is the number of data points in the sample </span></p></li><li><p><span>xi = each individual data point </span></p></li><li><p><span>μ = the population mean</span></p></li><li><p><span>x bar = sample mean </span></p></li></ul><p>to calculate, subtract the mean from each data point, square the result, sum all squared values, and</p><ul><li><p>for population divide by total number of observations</p></li><li><p>for sample divide by n - 1</p></li></ul><p></p>
55
New cards

standard deviation

the square root of the variance

  • use for unit matching

56
New cards

standard deviation formula

  • remember to square each individual difference

<ul><li><p>remember to square each individual difference </p></li></ul><p></p>
57
New cards

range

the difference between the max and min values in a data set

  • easy to calculate but caution

    • very sensitive to outliers

    • only considers two points and nothing in between

58
New cards

mean absolute variation

measure the average of the absolute from the center

  • absolute values must be used since otherwise the deviations around the mean would sum to zero. stated in the unit of measurement

  • justified using MAD instead of standard deviation to measure dispersion when the data contain certain outliers that could distort standard deviation

59
New cards

mean absolute deviation MAD formula

knowt flashcard image
60
New cards

coefficient of variation

  • useful for comparing variable measured in different units or with different means

  • a unit-free measure of dispersion

  • expressed as a percent of the mean

  • only appropriate for nonnegative data. undefined if the mean is zero or negative

formula

  • CV = 100 (standard deviation/mean)

61
New cards

dispersion vs central tendency

knowt flashcard image
62
New cards

percentile

a value below which a certain percentage of the data fall

  • ex. 55th percentile → the value below which 55% of the data fall

divide the data into equal chunks

63
New cards

quartiles

3 values that divide the data into 4 equal chunks

(deciles: 9 values that divide the data into 10 equal chunks)

  • used to characterize the distribution of the data

64
New cards

computing quartiles

sort the raw data in ascending order

  • Q1 at the position .25(n+1)

    • 25th percentile or P25

  • Q2 at the position .5(n+1) → the median, ½ (n+1)

    • 50th percentile or P50

  • Q3 at the position 0.75(n+1) → ¾ (n+1)

    • 75th percentile or P75

if the position is not an integer, linear interpolation is used (a.k.a take the average)

65
New cards

linear interpolation

  • define the integer portion and the fractional points

    • ex. 2.25 —> 2, 0.25

  • find the position of number with nearest integer, and that plus one

    • ex. 2, 3

  • interpolate by multiplying the difference between them by fractional portion, add result to lowest score

    • ex. (0.25)(# difference between in position 2 and 3) + (lower #, either 2 or 3)

66
New cards

five number summary

an exploratory data analysis tool. each describes where a particular value falls in the distribution. can tell you the shape of the distribution

  1. minimum value

  2. Q1

  3. Q2 (median)

  4. Q3

  5. maximum value

to make this determination, compare the median to Q1 and Q3

  • when the median is

    • approx. halfway between Q1 and Q3, data are symmetrical

    • closer to Q1, data are right-skewed

    • closer to Q3, data are left-skewed

67
New cards

interquartile range

the range within which the middle 50% of the data lie

  • not affected by outliers

  • small IQR → data is tightly clustered around the median, indicating low variability

  • large IQR → data is more spread out, indicating high variability

68
New cards

importance of quartiles for distribution shapes

  • for uniform and bell shaped → Q1, 2, 3 equally shaped

  • right skewed → distance between Q1 and 2 < dist. Q2 and 3

  • left skewed → distance between Q1 and 2 > dist. Q2 and 3

69
New cards

IQR vs standard deviation

  • std dev is generally preferred since it includes all observations, but it is easily influenced by extreme values

  • use IQR for skewed distributions or data with outliers

70
New cards

IQR and outliers

to find outliers

  • check histogram

  • use IQR formula

    • high outlier → x ≥ Q3 + (1.5 x IQR)

    • low outlier → x ≤ Q1 − (1.5 x IQR)

71
New cards

Chebyshev’s rule uses and importance

  • helps us determine where most of the datapoints fall in reference to the mean

  • applies to all datasets and distributions of shapes

  • uses only the concept of mean and std. deviation to estimate the percentage of values within 2, or 3 std. deviations of the mean

72
New cards

Chebyshev’s rule

for any population (sample data) with mean μ(x bar) and population

(sample) standard deviation σ(s), the percentage of observations that lie within k standard deviations of the mean, k>1, must be at least

100[1-1/k2]

<p>for any population (sample data) with mean μ(x bar) and population</p><p>(sample) standard deviation σ(s), <strong>the percentage of observations that lie within k standard deviations of the mean</strong>, k&gt;1, <strong>must be at least </strong></p><p><strong>100[1-1/k<sup>2</sup>] </strong></p>
73
New cards

normal distribution

a bell-shaped curve that is symmetrical

  • can be completely characterized by its mean and standard deviation

74
New cards

empirical rule

if the data is normally distributed, then the empirical rule states that we expect the interval [μ - kσ, μ + kσ] to contain a known percentage of data

k = 1; ±1 s.d. from the mean to contain 68.26% of the data

k = 2; ±2 s.d. from the mean to contain 95.44% of the data

k = 3; ±3 s.d. from the mean to contain 99.73% of the data

<p>if the data is normally distributed, then the empirical rule states that we expect the interval [μ - kσ, μ + kσ] to contain a known percentage of data </p><p>k = 1; ±1 s.d. from the mean to contain 68.26% of the data</p><p>k = 2; ±2 s.d. from the mean to contain 95.44% of the data</p><p>k = 3; ±3 s.d. from the mean to contain 99.73% of the data</p><p></p>
75
New cards

Chebyshev’s theorem vs empirical rule

  • chebyshev’s theorem gives lower bound percentages for any distribution

  • if the data is normal, empirical rule fives exact percentages

76
New cards

central limit theorem

the mean of many samples independently drawn from the same distribution nearly normal for large enough sample size

  • why normal distribution is one of the most important distributions in natural and social sciences

77
New cards

z-score

  • tell us the position of any observation relative to the mean. how far it is from the mean in terms of std. dev.

  • unit free

  • one-to-one relationship between z-score and data value; in other words, we can move between these two equations easily

    • original equation and x = z(s or σ) + mean

<ul><li><p>tell us <strong>the position</strong> of any observation relative to the mean. how far it is from the mean in terms of std. dev.</p></li><li><p>unit free</p></li><li><p>one-to-one relationship between z-score and data value; in other words, we can move between these two equations easily</p><ul><li><p>original equation and x = z(s or σ) + mean</p></li></ul></li></ul><p></p>
78
New cards

important z-scores

  • Z = -3: Three standard deviations below the mean.

  • Z = -2: Two standard deviations below the mean.

  • Z = -1: One standard deviations below the mean.

  • Z = 0: Exactly at the mean.

  • Z = +1: One standard deviation above the mean.

  • Z = +2: Two standard deviations above the mean.

  • Z = +3: Three standard deviations above the mean.

79
New cards

bivariate dataset

a data set consisting of two variables; goal to show some association between two variables

  • can be used to analyze the association between two variables visually or numerically

80
New cards

covariance

shows the direction of linear relationship between two random variables

measures how two variables change together

  • positive → direct positive relationship between two variables and they move in the same direction

  • negative → two variables move in the opposite direction, negative association

81
New cards

formula for covariance

each y observation minus mean for y values; same for x

<p>each y observation minus mean for y values; same for x </p>
82
New cards

correlation coefficient

a measure of the degree of linear association between two variables

strength and direction of linear association between two variables

gives more info than the covariance:

  • how closely related are the two variables x and y

  • a unit free number with range [-1,1]

    • value of -1 means perfect negative linear association

    • value of 1 means perfect positive linear association

    • value of 0 means no linear association

83
New cards

formula for correlation coefficient for 2 variables x and y

knowt flashcard image
84
New cards

scatter diagram

best to visually examine the relationship between two quantitative variables

<p>best to visually examine the relationship between two quantitative variables</p>
85
New cards

correlation vs causation

  • correlation does not mean causation

  • even if there’s a correlation between two variables, we cannot conclude that one variable causes a change in the other. could be a coincidental relationship or have another factor causing both variables to change

    • could be spurious relationship — meaningless

86
New cards

frequency vs relative frequency (bins)

frequency: number of observations in each bin

relative frequency: proportion of observations in each bin

87
New cards

all-inclusive classes

a class for each and every distribution

88
New cards

when looking at a sample, can you tell for sure if it was drawn without replacement?

no. only if the same sample point appears more than once can you tell it was drawn with replacement

89
New cards

value of k for systematic random sampling

between one and population/sample size

90
New cards

the lowest form of data assigned to categories that have no order associated with them?

nominal data

91
New cards

data whose measurement is inherently categorized?

qualitative data

92
New cards

cumulative relative frequency

the proportion of observations with values less than or equal to the upper limit of the class

93
New cards

frequency distribution

a summary of the set of data that displays the number of observations in each of the distributions distinct categories or classes

94
New cards

the balancing point of your histogram?

the mean

95
New cards

the covariance of an asset with itself is equal too..

the asset’s covariance

96
New cards

when visualizing bivariate data in XY space…

the means break up our data into 4 quadrants

97
New cards

sample mean

  • calculated using subset of population data

  • unlikely to equal population mean

  • an estimator of the population mean

98
New cards

calculate median

number of observations +1 / 2 = position of median

99
New cards

tells us the midpoint of the data

median

100
New cards

the larger the covariance …

the wider the distribution

Explore top notes

note
AP Biology Ultimate Guide
Updated 687d ago
0.0(0)
note
5-Anaerobes (2)
Updated 876d ago
0.0(0)
note
chapter 8
Updated 380d ago
0.0(0)
note
Physiology
Updated 1186d ago
0.0(0)
note
2.1: Cell Structure
Updated 1154d ago
0.0(0)
note
2.2 Cell Membrane
Updated 1154d ago
0.0(0)
note
Unit 7 P1 Skills of Chemistry >
Updated 471d ago
0.0(0)
note
AP Biology Ultimate Guide
Updated 687d ago
0.0(0)
note
5-Anaerobes (2)
Updated 876d ago
0.0(0)
note
chapter 8
Updated 380d ago
0.0(0)
note
Physiology
Updated 1186d ago
0.0(0)
note
2.1: Cell Structure
Updated 1154d ago
0.0(0)
note
2.2 Cell Membrane
Updated 1154d ago
0.0(0)
note
Unit 7 P1 Skills of Chemistry >
Updated 471d ago
0.0(0)

Explore top flashcards

flashcards
Science Revision
177
Updated 144d ago
0.0(0)
flashcards
Preterito Irregulares
95
Updated 801d ago
0.0(0)
flashcards
HGAP 5.1 - 5.5
47
Updated 1163d ago
0.0(0)
flashcards
Term Test 3
221
Updated 724d ago
0.0(0)
flashcards
DV Final
100
Updated 1019d ago
0.0(0)
flashcards
Chapter VII Quiz
39
Updated 1074d ago
0.0(0)
flashcards
Restaurant Industry Final
86
Updated 1061d ago
0.0(0)
flashcards
Science Revision
177
Updated 144d ago
0.0(0)
flashcards
Preterito Irregulares
95
Updated 801d ago
0.0(0)
flashcards
HGAP 5.1 - 5.5
47
Updated 1163d ago
0.0(0)
flashcards
Term Test 3
221
Updated 724d ago
0.0(0)
flashcards
DV Final
100
Updated 1019d ago
0.0(0)
flashcards
Chapter VII Quiz
39
Updated 1074d ago
0.0(0)
flashcards
Restaurant Industry Final
86
Updated 1061d ago
0.0(0)