Statistics for Economics Midterm

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 41

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

42 Terms

1

Statistics and Probability

  • Statistics: collection, analysis, and interpretation of data

  • two parts: descriptive and inferential

  • Probability: a mathematical tool to study randomness

  • Difference between between statistics and probability:

    • suppose there is a jar with 10 balls, 3 red and 7 green

    • in probability: we know the jar content and therefore the true probabilities 3/10 and 7/10. we ask questions such as what is P(2 red in a row) with replacement

    • in statistics: we do not know the jar content. but we take a sample of say n=4 balls (with replacement). with this sample, we estimate the true probabilities.

New cards
2

Population vs sample

  • Population: collection of persons or things under study

  • Sample: a subset of the population that provides information about the population

New cards
3

Sampling

  • Sampling: selection of a portion of the population

  • We want an adequate sampling method such that the sample is representative of the population

  • If the sample is representative, sample statistics are meaningful with respect to the population

New cards
4

Parameter vs Statistic

  • Parameter: number that represents a property of the population

    • example: true population mean (mu)

  • Statistic: number that represents a property of the sample

    • example: sample mean (x bar)

New cards
5

Variable X and Data

  • Variable X: a characteristic of interest for each person (or thing) of the population (examples: hours of sleep, GDP)

  • Data: actual values for the variables (persons or things)

New cards
6

Successful Sampling

  • Sampling: a sample should have the same characteristics as the population it is representing

  • Simple random sampling (SRS): names in a hat (or generate random numbers). Most important/common. Any group of n people is equally likely to be drawn.

    • example: pick n professors from Fordham

    • cluster: select departments randomly

    • systematic: each 20th name in the phonebook

    • convenience (not random)

  • replacement

  • Sampling error (bias): x-bar does not estimate mu

New cards
7

Sampling Error: Variation in samples (key concept) 1.2

  • Sampling Error: the natural variation that results from selecting a sample to represent a larger population

    • this variation decreases as the sample size increases, so selecting larger samples reduces sampling error

New cards
8

Sampling with Replacement and without Replacement

  • Sampling with Replacement: once a member of the population is selected for inclusion in a sample, that member is returned to the population for the selection of the next individual

  • Sampling without Replacement: a member of the population may be chosen for inclusion in a sample only once. if chosen, the member is not returned to the population before the next selection

New cards
9

Frequency, Relative Frequency, & Cumulative relative frequency

  • Frequency: the number of times the value of the variable occurs in the sample

  • Relative frequency: the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes. Relative frequencies can be written as fractions, percents, or decimals.

  • Cumulative relative frequency: the accumulation of the previous relative frequences. Add all previous relative frequencies to the relative frequency for the current ro.

New cards
10

Histograms

  1. To construct a histogram, first decide how many bars or intervals, also called classes, represent the data. Many histograms consist of 5 to 15 bars or classes for clarity

  2. Choose a starting point

    • Less than the smallest data value

    • A convenient starting point is a lower value carried out to one more decimal place than the value with the most decimal places

    • ex. if the value with the most decimal places is 2.23 and the lowest value is 1.5, a convenient starting point is 1.495 (1.5 - 0.005).

    • when the starting point and other boundaries are carried to one additional decimal place, no data value will fall on a boundary

  3. Calculate the width of each bar or class interval.

    • Subtract the starting point from the ending value and divide by the number of bars (you must choose the number of bars you desire)

New cards
11

Frequency polygons

  • Analogous to line graphs and histograms— make continuous data visually easy to interpret

  • Useful for comparing distributions

New cards
12

Levels of Measurement

  • the way a set of data is measured is called its level of measurement

  • levels of measurement (from lowest to highest level):

    1. Nominal scale level

      • data that is measured using a nominal scale is qualitative (categorical). categories, colors, names, labels, and favorite foods along with yes or no responses are examples of nominal level data.

      • not ordered

      • cannot be used in calculations

    2. Ordinal scale level

      • similar to nominal scale data but there is a big difference

      • the ordinal scale data can be ordered

      • ex. top five national parks in the US

      • can be ordered but the differences cannot be measured

      • cannot be used in calculations

    3. Interval scale level

      • similar to ordinal level data because it has a definite ordering but there is a difference between data

      • the differences between interval scale data can be measured though the data does not have a starting point

      • temperature scales like celsius and fahrenheit are measured using the interval scale

      • can be used in calculations, but one type of comparison cannot be done: no meaning to ratios

    4. Ratio scale level

      • takes care of the ratio problem and gives you the most information

      • like interval scale data, but it has a starting point and ratios can be calculated

      • the data can be put in order from lowest to highest

      • the differences have a meaning

      • ratios can be calculated

New cards
13

Quantitative data: discrete or contiuous

  • Discrete: take on only certain numerical values

    • ex. counting number of phone calls you receive for each day of the week

  • Continuous: made up of counting numbers, but may include fractions, decimals, irrational numbers, etc.

    • ex. lengths, weights, times, etc.

New cards
14

Key Components of Every Experiment (to produce reliable data)

  1. Subjects must be assigned randomly to different treatment groups to eliminate lurking variables

  2. One of the groups must act as a control group, demonstrating what happens when the active treatment is not applied

  3. Participants in the control group receive a placebo treatment that looks exactly like the active treatments but cannot influence the response variable

    • To preserve the integrity of the placebo, both researchers and subjects may be blinded

New cards
15

Measures of the Location of Data: Quartiles and Percentiles

  • Quartiles divide an ordered data set into four equal parts

    • about one-fourth of the data falls on or below the first quartile Q1

    • about one-half of the data falls on or below the second quartile Q2

    • about three-fourths of the data falls on or below the first quartile Q3

  • Percentiles divide ordered data into hundredths

    • To score in the 90th percentile of an exam does not necessarily mean that you received a 90% on a test. It means that 90% of test scores are the same or less than your score, and 10% of the test scores are the same or greater than your test score

New cards
16

Finding Quartiles

  1. Find Q2 by finding the median (n+1/2)

  2. Find Q1— the middle value of the lower half of the data

    • one fourth of the entire set of values are the same or less than Q1 and three fourths of the values are more than Q1

  3. Find Q3— the middle value, or median, of the upper half of the data

    • three fourths of the ordered data set are less than Q3 and one fourth of the ordered data set is greater than Q3

New cards
17

Interquartile Range (IQR)

  • The interquartile range is a number that indicates the spread of the middle half or the middle 50% of data

  • It is the difference between the third quartile (Q3) and the first quartile (Q1)

  • IQR = Q3 - Q1

New cards
18

Interquartile Range (IQR) and Outliers

  • The IQR can help to determine potential outliers

  • A value is suspect to be a potential outlier if it less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile

    • Q1 - (1.5)(IQR)

    • Q3 + (1.5)(IQR)

    • [Q1 - (1.5 x IQR) ; Q3 + (1.5 x IQR)]

New cards
19

Interpreting percentiles: On a 20 questions math test, the 70th percentile for correct answers was 16

What does this mean?

  • 70% had 16 correct answers or less. 30% had 16 correct answers or more

New cards
20

Percentiles & frequency tables

  • percentile in frequency table —> cumulative relative frequency column

  • frequency —> values at 28th percentile

New cards
21

Resistant measure

  • A resistant measure is a statistical measurement that is not significantly affected by outliers

  • The mean is not robust to outliers

  • The median is robust to outliers

  • Another center statistic: the mode

    • the mode is the most frequent value in the sample

    • it also works for qualitative data

New cards
22

Skew and Distribution

  • To understand skewness:

    • mean - median

    • outliers pull the mean away from the median

  • sample mean > median → right/positive skewed distribution

  • sample mean < median → left/negative skewed distribution

  • sample mean ≅ median → symmetrical distribution

New cards
23

Box Plots

  • Box plots give a good graphica image of the concentration of the data

  • Show how far the extreme values are from most of the data

  • Constructed from five values:

    1. Minimum value

    2. First quartile

    3. Median

    4. Third Quartile

    5. Maximum value

  • The middle 50 percent of the data falls inside the box

  • The first quartile marks one end of the box, and the third quartile marks the other end of the box

  • The median or second quartile can be between the first and third quartiles

  • The smallest and largest data values label the endpoints of the axis and extend from the ends of the box

New cards
24

Standard Deviation

  • The most common measure of variation or spread

  • The standard deviation is a number that measures how far data values are from their mean

  • It provides a numerical measure of the overall amount of variation in a data set

  • It can be used to determine whether a particular data value is close to or far from the mean

  • Higher standard deviation → more variation

  • lower case letter s represents the sample standard deviation

  • greek letter o sigma represents the population standard deviation

  • sample → divide by n-1

  • population → divide by N

New cards
25

Experiment, sample space, event, etc. (Chapter 3: probability topics)

  • Experiment: planned operation with a random outcome carried out under controlled conditions

    • ex. one coin flipping

  • Sample space: a set of possible outcomes

    • ex. S = {H,T}

  • Event: an event A is a subset of the sample space

  • Probability of an outcome: number between 0 and 1 that can be seen as the long-term relative frequency of that outcome

  • Probability of an event A when outcomes are equally likely

New cards
26

Different types of probabilities and how they relate

  • Marginal probabilities: P(A), P(B)

  • OR events: P(A U B) = P(A or B)

  • AND events: P(A and B) → aka joint probability

  • Conditional probability: P(A | B) P(A given B)

New cards
27

Bayes theorem

  • Conditional probability = joint/marginal

  • P(A|B) = P(A and B) / P(B)

  • P(A and B) = P(A|B) x P(B)

  • P(A and B) = P(B|A) x P(A)

  • So: P(A|B) x P(B) = P(B|A) x P(A)

New cards
28

Independence and mutual exclusion

  • Independence: A and B are independent if P(A|B) = P(A) or P(A and B) = P(A) x P(B) i.e. the conditioning set is useless (or P(B|A) = P(B))

    • one event occurring does not affect the chance the other occurs

    • intuition: roulette vs. black jack

    • note: if A and B are independent, then Bayes theorem becomes (Bayes’ particular case): P(A and B) = P(A) x P(B)

    • under independence, the joint is the product of marginal

  • Mutual exclusion: A and B are mutually exclusive when the joint is 0 → P(A and B) = 0

    • events that cannot occur at the same time

New cards
29

Two basic rules of probability

  1. P(A and B) = P(A|B) x P(B)

    • Reduces to P(A and B) = P(A) x P(B) under independence

    • AND → product

  2. P(A or B) = P(A) + P(B) - P(A and B)

    • Reducls to P(A or B) = P(A) + P(B) under mutual exclusion

    • OR → sum

New cards
30

Sampling with replacement or without replacement

  • With replacement: the events are considered to be independent, meaning the result of the first pick will not change the probabilities for the second pick

  • Without replacement: the events are considered to be dependent or not independent

New cards
31

Contingency tables

New cards
32

Discrete Random Variable

  • Discrete data are data that you can count

  • A random variable describes the outcomes of a statistical experiment in words

  • The values of a random variable can vary with each repetition of an experiment

New cards
33

Random Variable Notation

  • Upper case letters such as X or Y denote a random variable

  • Lower case letters like x or y denote the value of a random variable

  • If X is a random variable, then X is written in words, and x is given as a number

  • For example, let X= the number of heads you get when you toss three fair coins. The sample space for the toss of three fair coins is TTT;THH;HTH;HHT;HTT;THT;TTH;HHH

  • Then, x = 0,1,2,3

  • Because you can count the possible values that X can take on and the outcomes are random (the x values 0,1,2,3), X is a discrete random variable

New cards
34

Random Variables

  • A random variable X (or Y) takes different values with different probabilities

  • Example: experiment: flip 2 coins

  • S= {HH,HT,TH,TT}

  • Define X= count of heads from flipping 2 coins

  • Possible values for X: 0,1,2

  • But X will realize those values with different probabilities (1/4,2/4,1/4)

  • We want to describe the probability with which X takes on different values → We use a PDF

New cards
35

Is the sample mean (x bar) a RV?

  • Yes, because its value depends on which specific random sample is drawn from a population, meaning it can vary depending on the sample selected, and therefore has a probability distribution associated with it.

New cards
36

Probability Distribution Function (PDF) for a Discrete Random Variable

  • A discrete probability distribution function has two characteristics:

    1. Each probability is between zero and one, inclusive.

    2. The sum of the probabilities is one

New cards
37

Mean or Expected Value

  • The expected value is often referred to as the “long-term” average or mean. This means that over the long term of doing an experiment over and over, you would expect this average.

  • Law of large numbers → as the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency approaches zero (the theoretical probability and the relative frequency get closer and closer together)

  • The mean (mu) of a discrete probability function is the expected value E(X)

    • mu = E(X) = sum of x x P(X)

      • P(X): probability that X takes on a value x

New cards
38

Standard deviation of a RV/PDF

  • square root of the variance

  • square root of the sum of (x - mu)² x P(x)

New cards
39

Binomial Experiment and Binomial Probability Distirubtion

  • There are three characteristics of a binomial experiment

    1. There are a fixed number of trials.

    2. There are only two possible outcomes, called “success” and “failure,” for each trial. The letter p denotes the probability of a success on one trial, and q denotes the probability of a failure on one trial. p + q= 1

    3. The n trials are independent and are repeated using identical conditions. Because the n trials are independent, the outcome of one trial does not help predict the outcome of another. Chance of success vs. failure remains the same for each individual trial.

  • the outcomes of a binomial experiment fit a binomial probability distribution

  • the random variable X= the number of successes obtained in the n independent trials

New cards
40

The binomial distribution (form slides)

  • 1st theoretical distribution that underlies all others

  • A distribution can be theoretical or empirical

  • The binomial distribution describes the probability of x successes in n trials of a Bernoulli process

  • Bernoulli process:

    • 2 or more successive trials

    • 2 possible outcome

    • Trials are independent

    • Probability of success remains constant

New cards
41

Binomial Probability Distribution: Mean and Varaince

  • E(X) = np

  • V(X)= npq

    • standard deviation → square root of npq

New cards
42

PDF & CDF: the binomial distribution

  • Probability Density Function or PDF: the probability of a random variable taking a specific value

  • Cumulative Distribution Function or CDF: the probability that the random variable X is less than or equal to x

  • PDF → P(X = x)

  • CDF → P(X ≤ x)

  • P(X > x) → 1 - binomialcdf

New cards
robot