Statistics for Economics Midterm

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/41

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

42 Terms

New cards

Statistics and Probability

Statistics: collection, analysis, and interpretation of data
two parts: descriptive and inferential
Probability: a mathematical tool to study randomness
Difference between between statistics and probability:
- suppose there is a jar with 10 balls, 3 red and 7 green
- in probability: we know the jar content and therefore the true probabilities 3/10 and 7/10. we ask questions such as what is P(2 red in a row) with replacement
- in statistics: we do not know the jar content. but we take a sample of say n=4 balls (with replacement). with this sample, we estimate the true probabilities.

New cards

Population vs sample

Population: collection of persons or things under study
Sample: a subset of the population that provides information about the population

New cards

Sampling

Sampling: selection of a portion of the population
We want an adequate sampling method such that the sample is representative of the population
If the sample is representative, sample statistics are meaningful with respect to the population

New cards

Parameter vs Statistic

Parameter: number that represents a property of the population
- example: true population mean (mu)
Statistic: number that represents a property of the sample
- example: sample mean (x bar)

New cards

Variable X and Data

Variable X: a characteristic of interest for each person (or thing) of the population (examples: hours of sleep, GDP)
Data: actual values for the variables (persons or things)

New cards

Successful Sampling

Sampling: a sample should have the same characteristics as the population it is representing
Simple random sampling (SRS): names in a hat (or generate random numbers). Most important/common. Any group of n people is equally likely to be drawn.
- example: pick n professors from Fordham
- cluster: select departments randomly
- systematic: each 20th name in the phonebook
- convenience (not random)
replacement
Sampling error (bias): x-bar does not estimate mu

New cards

Sampling Error: Variation in samples (key concept) 1.2

Sampling Error: the natural variation that results from selecting a sample to represent a larger population
- this variation decreases as the sample size increases, so selecting larger samples reduces sampling error

New cards

Sampling with Replacement and without Replacement

Sampling with Replacement: once a member of the population is selected for inclusion in a sample, that member is returned to the population for the selection of the next individual
Sampling without Replacement: a member of the population may be chosen for inclusion in a sample only once. if chosen, the member is not returned to the population before the next selection

New cards

Frequency, Relative Frequency, & Cumulative relative frequency

Frequency: the number of times the value of the variable occurs in the sample
Relative frequency: the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes. Relative frequencies can be written as fractions, percents, or decimals.
Cumulative relative frequency: the accumulation of the previous relative frequences. Add all previous relative frequencies to the relative frequency for the current ro.

New cards

Histograms

To construct a histogram, first decide how many bars or intervals, also called classes, represent the data. Many histograms consist of 5 to 15 bars or classes for clarity
Choose a starting point
- Less than the smallest data value
- A convenient starting point is a lower value carried out to one more decimal place than the value with the most decimal places
- ex. if the value with the most decimal places is 2.23 and the lowest value is 1.5, a convenient starting point is 1.495 (1.5 - 0.005).
- when the starting point and other boundaries are carried to one additional decimal place, no data value will fall on a boundary
Calculate the width of each bar or class interval.
- Subtract the starting point from the ending value and divide by the number of bars (you must choose the number of bars you desire)

New cards

Frequency polygons

Analogous to line graphs and histograms— make continuous data visually easy to interpret
Useful for comparing distributions

New cards

Levels of Measurement

the way a set of data is measured is called its level of measurement
levels of measurement (from lowest to highest level):
1. Nominal scale level
  - data that is measured using a nominal scale is qualitative (categorical). categories, colors, names, labels, and favorite foods along with yes or no responses are examples of nominal level data.
  - not ordered
  - cannot be used in calculations
2. Ordinal scale level
  - similar to nominal scale data but there is a big difference
  - the ordinal scale data can be ordered
  - ex. top five national parks in the US
  - can be ordered but the differences cannot be measured
  - cannot be used in calculations
3. Interval scale level
  - similar to ordinal level data because it has a definite ordering but there is a difference between data
  - the differences between interval scale data can be measured though the data does not have a starting point
  - temperature scales like celsius and fahrenheit are measured using the interval scale
  - can be used in calculations, but one type of comparison cannot be done: no meaning to ratios
4. Ratio scale level
  - takes care of the ratio problem and gives you the most information
  - like interval scale data, but it has a starting point and ratios can be calculated
  - the data can be put in order from lowest to highest
  - the differences have a meaning
  - ratios can be calculated

New cards

Quantitative data: discrete or contiuous

Discrete: take on only certain numerical values
- ex. counting number of phone calls you receive for each day of the week
Continuous: made up of counting numbers, but may include fractions, decimals, irrational numbers, etc.
- ex. lengths, weights, times, etc.

New cards

Key Components of Every Experiment (to produce reliable data)

Subjects must be assigned randomly to different treatment groups to eliminate lurking variables
One of the groups must act as a control group, demonstrating what happens when the active treatment is not applied
Participants in the control group receive a placebo treatment that looks exactly like the active treatments but cannot influence the response variable
- To preserve the integrity of the placebo, both researchers and subjects may be blinded

New cards

Measures of the Location of Data: Quartiles and Percentiles

Quartiles divide an ordered data set into four equal parts
- about one-fourth of the data falls on or below the first quartile Q1
- about one-half of the data falls on or below the second quartile Q2
- about three-fourths of the data falls on or below the first quartile Q3
Percentiles divide ordered data into hundredths
- To score in the 90th percentile of an exam does not necessarily mean that you received a 90% on a test. It means that 90% of test scores are the same or less than your score, and 10% of the test scores are the same or greater than your test score

New cards

Finding Quartiles

Find Q2 by finding the median (n+1/2)
Find Q1— the middle value of the lower half of the data
- one fourth of the entire set of values are the same or less than Q1 and three fourths of the values are more than Q1
Find Q3— the middle value, or median, of the upper half of the data
- three fourths of the ordered data set are less than Q3 and one fourth of the ordered data set is greater than Q3

New cards

Interquartile Range (IQR)

The interquartile range is a number that indicates the spread of the middle half or the middle 50% of data
It is the difference between the third quartile (Q3) and the first quartile (Q1)
IQR = Q3 - Q1

New cards

Interquartile Range (IQR) and Outliers

The IQR can help to determine potential outliers
A value is suspect to be a potential outlier if it less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile
- Q1 - (1.5)(IQR)
- Q3 + (1.5)(IQR)
- [Q1 - (1.5 x IQR) ; Q3 + (1.5 x IQR)]

New cards

Interpreting percentiles: On a 20 questions math test, the 70th percentile for correct answers was 16

What does this mean?

70% had 16 correct answers or less. 30% had 16 correct answers or more

New cards

Percentiles & frequency tables

percentile in frequency table —> cumulative relative frequency column
frequency —> values at 28th percentile

New cards

Resistant measure

A resistant measure is a statistical measurement that is not significantly affected by outliers
The mean is not robust to outliers
The median is robust to outliers
Another center statistic: the mode
- the mode is the most frequent value in the sample
- it also works for qualitative data

New cards

Skew and Distribution

To understand skewness:
- mean - median
- outliers pull the mean away from the median
sample mean > median → right/positive skewed distribution
sample mean < median → left/negative skewed distribution
sample mean ≅ median → symmetrical distribution

New cards

Box Plots

Box plots give a good graphica image of the concentration of the data
Show how far the extreme values are from most of the data
Constructed from five values:
1. Minimum value
2. First quartile
3. Median
4. Third Quartile
5. Maximum value
The middle 50 percent of the data falls inside the box
The first quartile marks one end of the box, and the third quartile marks the other end of the box
The median or second quartile can be between the first and third quartiles
The smallest and largest data values label the endpoints of the axis and extend from the ends of the box

New cards

Standard Deviation

The most common measure of variation or spread
The standard deviation is a number that measures how far data values are from their mean
It provides a numerical measure of the overall amount of variation in a data set
It can be used to determine whether a particular data value is close to or far from the mean
Higher standard deviation → more variation
lower case letter s represents the sample standard deviation
greek letter o sigma represents the population standard deviation
sample → divide by n-1
population → divide by N

New cards

Experiment, sample space, event, etc. (Chapter 3: probability topics)

Experiment: planned operation with a random outcome carried out under controlled conditions
- ex. one coin flipping
Sample space: a set of possible outcomes
- ex. S = {H,T}
Event: an event A is a subset of the sample space
Probability of an outcome: number between 0 and 1 that can be seen as the long-term relative frequency of that outcome
Probability of an event A when outcomes are equally likely

New cards

Different types of probabilities and how they relate

Marginal probabilities: P(A), P(B)
OR events: P(A U B) = P(A or B)
AND events: P(A and B) → aka joint probability
Conditional probability: P(A | B) P(A given B)

New cards

Bayes theorem

Conditional probability = joint/marginal
P(A|B) = P(A and B) / P(B)
P(A and B) = P(A|B) x P(B)
P(A and B) = P(B|A) x P(A)
So: P(A|B) x P(B) = P(B|A) x P(A)

New cards

Independence and mutual exclusion

Independence: A and B are independent if P(A|B) = P(A) or P(A and B) = P(A) x P(B) i.e. the conditioning set is useless (or P(B|A) = P(B))
- one event occurring does not affect the chance the other occurs
- intuition: roulette vs. black jack
- note: if A and B are independent, then Bayes theorem becomes (Bayes’ particular case): P(A and B) = P(A) x P(B)
- under independence, the joint is the product of marginal
Mutual exclusion: A and B are mutually exclusive when the joint is 0 → P(A and B) = 0
- events that cannot occur at the same time

New cards

Two basic rules of probability

P(A and B) = P(A|B) x P(B)
- Reduces to P(A and B) = P(A) x P(B) under independence
- AND → product
P(A or B) = P(A) + P(B) - P(A and B)
- Reducls to P(A or B) = P(A) + P(B) under mutual exclusion
- OR → sum

New cards

Sampling with replacement or without replacement

With replacement: the events are considered to be independent, meaning the result of the first pick will not change the probabilities for the second pick
Without replacement: the events are considered to be dependent or not independent

New cards

Contingency tables

New cards

Discrete Random Variable

Discrete data are data that you can count
A random variable describes the outcomes of a statistical experiment in words
The values of a random variable can vary with each repetition of an experiment

New cards

Random Variable Notation

Upper case letters such as X or Y denote a random variable
Lower case letters like x or y denote the value of a random variable
If X is a random variable, then X is written in words, and x is given as a number
For example, let X= the number of heads you get when you toss three fair coins. The sample space for the toss of three fair coins is TTT;THH;HTH;HHT;HTT;THT;TTH;HHH
Then, x = 0,1,2,3
Because you can count the possible values that X can take on and the outcomes are random (the x values 0,1,2,3), X is a discrete random variable

New cards

Random Variables

A random variable X (or Y) takes different values with different probabilities
Example: experiment: flip 2 coins
S= {HH,HT,TH,TT}
Define X= count of heads from flipping 2 coins
Possible values for X: 0,1,2
But X will realize those values with different probabilities (1/4,2/4,1/4)
We want to describe the probability with which X takes on different values → We use a PDF

New cards

Is the sample mean (x bar) a RV?

Yes, because its value depends on which specific random sample is drawn from a population, meaning it can vary depending on the sample selected, and therefore has a probability distribution associated with it.

New cards

Probability Distribution Function (PDF) for a Discrete Random Variable

A discrete probability distribution function has two characteristics:
1. Each probability is between zero and one, inclusive.
2. The sum of the probabilities is one

New cards

Mean or Expected Value

The expected value is often referred to as the “long-term” average or mean. This means that over the long term of doing an experiment over and over, you would expect this average.
Law of large numbers → as the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency approaches zero (the theoretical probability and the relative frequency get closer and closer together)
The mean (mu) of a discrete probability function is the expected value E(X)
- mu = E(X) = sum of x x P(X)
  - P(X): probability that X takes on a value x

New cards

Standard deviation of a RV/PDF

square root of the variance
square root of the sum of (x - mu)² x P(x)

New cards

Binomial Experiment and Binomial Probability Distirubtion

There are three characteristics of a binomial experiment
1. There are a fixed number of trials.
2. There are only two possible outcomes, called “success” and “failure,” for each trial. The letter p denotes the probability of a success on one trial, and q denotes the probability of a failure on one trial. p + q= 1
3. The n trials are independent and are repeated using identical conditions. Because the n trials are independent, the outcome of one trial does not help predict the outcome of another. Chance of success vs. failure remains the same for each individual trial.
the outcomes of a binomial experiment fit a binomial probability distribution
the random variable X= the number of successes obtained in the n independent trials

New cards

The binomial distribution (form slides)

1st theoretical distribution that underlies all others
A distribution can be theoretical or empirical
The binomial distribution describes the probability of x successes in n trials of a Bernoulli process
Bernoulli process:
- 2 or more successive trials
- 2 possible outcome
- Trials are independent
- Probability of success remains constant

New cards

Binomial Probability Distribution: Mean and Varaince

E(X) = np
V(X)= npq
- standard deviation → square root of npq

New cards

PDF & CDF: the binomial distribution

Probability Density Function or PDF: the probability of a random variable taking a specific value
Cumulative Distribution Function or CDF: the probability that the random variable X is less than or equal to x
PDF → P(X = x)
CDF → P(X ≤ x)
P(X > x) → 1 - binomialcdf