Economic Data Analytics Study Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/58

Earn XP

Description and Tags

Flashcards covering statistical concepts, probability, hypothesis testing, regression models, and data management based on the Economic Data Analytics lecture notes.

Last updated 2:15 AM on 4/29/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

59 Terms

New cards

Inferential Statistics

Drawing conclusions about the population based on the sample

New cards

Population

The entire group were studying

New cards

Sample

A small subgroup in the population

New cards

Parameter

A numerical measurement of the population

New cards

Statistic

A numerical measurement of the sample

New cards

Constant

The true value of a population parameter

New cards

Sampling Distribution

The probability distribution of a statistic that would be found if one selected all random samples of size $n$ from a population.

New cards

Standard Error

The standard deviation of a statistic or the sampling distribution.

New cards

Sampling Error

The difference between the values of the sample statistic and the population parameter.

New cards

Simple Random Sampling

Picking items at random where each item has an equal chance of getting picked

New cards

Stratified Sampling

A method where the population is broken into homogenous subgroups called strata which are mutually exclusive and exhaustive.

New cards

Cluster Sampling

A method where the population is broken into natural groupings called clusters, such as by town within a state, and a random sample is used to choose one cluster.

New cards

Normal Curve

Symmetric
Not Skewed
Its tails come close to but never touch the axis (asymptotic)

New cards

Central Limit Theorem

The principle that the sampling distribution of the mean of any independent, random variable will be normal if the sample size is large enough. Almost all values are between -3 to 3 standard deviations

New cards

Z-score

A standard measure that expresses scores in units of standard deviations, making it possible to compare different distributions. X - X dash/ S

New cards

Null Hypothesis ( $H_0$ )

A statement of equality that assumes no difference between groups or variables until evidence proves otherwise.

New cards

Research Hypothesis ( $H_1$ )

A definite statement that a difference exists between variables, which can be directional or nondirectional.

New cards

One-tailed Test

A statistical test that reflects a directional hypothesis and posits a difference in a particular direction.

New cards

Two-tailed Test

A statistical test that reflects a nondirectional hypothesis and does not specify the direction of the difference.

New cards

Statistically Significant

A finding that a difference is not due to chance, but rather due to some systematic influence.

New cards

Difference b/n null hypothesis and research hypothesis

Null hypothesis: 1. statement of equality

Related to population
Indirectly related to the sample
uses mu (u)

Research hypothesis: 1. statement of inequality

Related to the sample
uses x dash

New cards

Features of a good hypothesis

Declaration not question
Expected relationship between variables
Reflects theory
Brief
Testable

New cards

Statistically Significant

A difference is not due to chance by actually has a systematic reason

New cards

Type 1 Error (Alpha)

Incorrectly rejecting a true null hypothesis, also known as a "false positive."

New cards

Type 2 Error (Beta)

Incorrectly accepting a false null hypothesis, also known as a "false negative."

New cards

Power

The probability of detecting an effect if one is present, or the probability of avoiding a Type 2 error. 1-beta

New cards

Factors that affect power

Alpha - increase in alpha, decreases beta and increases power
Sample size - larger sample size increases power
Variability - greater variability decreases power
Magnitude of the effect of a variable - higher magnitude makes detection easier which increases the power

New cards

Difference b/n significant and meaningful

Significant - something happens for a reason not by chance. Determined by p value and alpha

Meaningful - the result actually matters. Determined by effect size, impact etc.

New cards

Confidence Interval

An estimated range of values that is likely to include the unknown population value.

New cards

Z statistic

x dash - population mean/ standard of error

Z test checks whether the population mean is equal to some specific value

If the z statistic > z critical value we reject. If not we fail to reject

New cards

SEM (Standard Error of the Mean)

Measure of variability between sample means calculated as $\text{SEM} = \frac{\sigma}{\sqrt{n}}$ ; it estimates how accurately the sample mean represents the population mean.

New cards

Effect Size

A measure of how different two groups are from one another that helps determine the "meaningfulness" of a result. Simple effect size = x1 dash - x2 dash/ S Complicated effect size = x1 dash - x2 dash/root variance 1 + variance 2/ 2. 0.0 - 0.2 effect size is small, 0.2 - 0.5 is medium, 0.5 and above is large

New cards

T test for independent groups

t statistic = x1 dash - x2 dash/root (1/n1 + 1/n2)((n1-1)s1² + (n2 - 1)s2² /n1+n2 -2

df= n1+n2 -2

This checks whether the population mean of group 1 = population mean of group 2

If t statistic > t critical value we reject and if the t statistic < t critical value we fail to reject

New cards

T test for dependent groups

t statistic = summation D/root nsummationD² - summation (D)² / n-1

df = n-1

This checks whether the average difference = 0

same rule as the independent groups to accept and reject

New cards

Degrees of Freedom ( $df$ )

The number of values in the final calculation of a statistic that are free to vary; for an independent t-test, it is $(N_1 - 1) + (N_2 - 1)$ , and for a one-way ANOVA, the numerator $(k - 1)$ and denominator $(N - k)$ .

New cards

Coefficient Correlation

Measures the strength and direction of the linear relationship between two variables. Ranges from -1 to 1 and a 0 coefficient correlation means no relationship. r = n(summationxy) - (summation x* summation y)/ root (n*summation x² - summation (x)² )(n*summation y² - summation (y)²). 0.0-0.2 is very weak, 0.2-0.4 is weak, 0.4-0.6 is moderate, 0.6-0.8 strong, 0.8-1.0 very strong. x decreases y decreases (direct and positive), x increases and y increases (direct and positive), x decreases and y increases (indirect and negative) and x increases and y decreases (indirect and negative)

New cards

Coefficient of Determination ( $r^2$ )

The percentage of variance in one variable that is shared with or explained by the variance of another variable.

New cards

Coefficient of Non Determination

How much of a change in variable Y is not caused by variable X

New cards

Coeffiicent correlation test

p = 0 and p not equal to 0
The test statistic is either the coefficient correlation value or the t statistic
Find t critical value and comparing

df = n - 2

New cards

Simple Linear Regression

The line of best fit that shows the linear relationship between two variables.

New cards

Independent variable

Variable that helps determine another variable (X)

New cards

Dependent Variable

Variable that is the outcome which depends on another variable for its values

New cards

Prediction error

Difference between actual and forecasted. Total prediction error should be the lowest at the line of best fit. summation (y actual - y predicted)²

New cards

Limits of R²

Only focuses on the linear relationship and neglects all others
Influenced by outliers

New cards

Analysis of Variance (ANOVA)

A statistical test used to determine if there are significant differences between the means of more than two groups.

New cards

F-statistic

The ratio of the variability between groups to the variability within groups, calculated as $\text{F} = \frac{\text{Mean Squares Between}}{\text{Mean Squares Within}}$ .

New cards

Factorial Analysis of Variance

A variation of ANOVA that explores more than one treatment factor and identifies main effects and interaction effects.

New cards

Nonparametric Tests

Statistical tests used when assumptions like normal distribution or homogeneity of variance are violated.

New cards

Chi-square Goodness-of-fit Test

A nonparametric test used to determine if the observed frequency of occurrences in categories matches what is expected by chance.

New cards

Linear Regression

A predictive tool that estimates a statistical relationship between a continuous independent variable ( $X$ ) and a continuous dependent variable ( $Y$ ) using a line of best fit.

New cards

R-squared ( $R^2$ )

A measure of how much variation in the dependent variable is explained by the variation in the independent variable(s).

New cards

Multiple Regression

A regression model with more than one predictor (independent) variable to explain the outcome of a dependent variable.

New cards

Logistic Regression

A predictive tool used when the dependent variable is binary rather than continuous, utilizing maximum likelihood estimation (MLE).

New cards

Analysis File

A static version of the data that is fully cleaned and prepped before starting the formal analysis.

New cards

Appending

The process of stacking one data file on top of another to combine observations.

New cards

Merging

The process of combining files by adding variables to observations using a unique identifier.

New cards

Data Science

A field combining statistics, computer science, and programming used to extract information from big or unstructured data.

New cards

Natural Language Processing (NLP)

A data science technique focused on extracting the fuller meaning from free text, including grammar and parts of speech.

New cards

Pivot Tables

Excel tools used to summarize and analyze large data files through functions like Sum, Count, Average, Max, and Min.