Economic Data Analytics Study Flashcards

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/58

flashcard set

Earn XP

Description and Tags

Flashcards covering statistical concepts, probability, hypothesis testing, regression models, and data management based on the Economic Data Analytics lecture notes.

Last updated 2:15 AM on 4/29/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

59 Terms

1
New cards

Inferential Statistics

Drawing conclusions about the population based on the sample

2
New cards

Population

The entire group were studying

3
New cards

Sample

A small subgroup in the population

4
New cards

Parameter

A numerical measurement of the population

5
New cards

Statistic

A numerical measurement of the sample

6
New cards

Constant

The true value of a population parameter

7
New cards

Sampling Distribution

The probability distribution of a statistic that would be found if one selected all random samples of size nn from a population.

8
New cards

Standard Error

The standard deviation of a statistic or the sampling distribution.

9
New cards

Sampling Error

The difference between the values of the sample statistic and the population parameter.

10
New cards

Simple Random Sampling

Picking items at random where each item has an equal chance of getting picked

11
New cards

Stratified Sampling

A method where the population is broken into homogenous subgroups called strata which are mutually exclusive and exhaustive.

12
New cards

Cluster Sampling

A method where the population is broken into natural groupings called clusters, such as by town within a state, and a random sample is used to choose one cluster.

13
New cards

Normal Curve

  1. Symmetric

  2. Not Skewed

  3. Its tails come close to but never touch the axis (asymptotic)

14
New cards

Central Limit Theorem

The principle that the sampling distribution of the mean of any independent, random variable will be normal if the sample size is large enough. Almost all values are between -3 to 3 standard deviations

15
New cards

Z-score

A standard measure that expresses scores in units of standard deviations, making it possible to compare different distributions. X - X dash/ S

16
New cards

Null Hypothesis (H0H_0)

A statement of equality that assumes no difference between groups or variables until evidence proves otherwise.

17
New cards

Research Hypothesis (H1H_1)

A definite statement that a difference exists between variables, which can be directional or nondirectional.

18
New cards

One-tailed Test

A statistical test that reflects a directional hypothesis and posits a difference in a particular direction.

19
New cards

Two-tailed Test

A statistical test that reflects a nondirectional hypothesis and does not specify the direction of the difference.

20
New cards

Statistically Significant

A finding that a difference is not due to chance, but rather due to some systematic influence.

21
New cards

Difference b/n null hypothesis and research hypothesis

Null hypothesis: 1. statement of equality

  1. Related to population

  2. Indirectly related to the sample

  3. uses mu (u)

Research hypothesis: 1. statement of inequality

  1. Related to the sample

  2. uses x dash

22
New cards

Features of a good hypothesis

  1. Declaration not question

  2. Expected relationship between variables

  3. Reflects theory

  4. Brief

  5. Testable

23
New cards

Statistically Significant

A difference is not due to chance by actually has a systematic reason

24
New cards

Type 1 Error (Alpha)

Incorrectly rejecting a true null hypothesis, also known as a "false positive."

25
New cards

Type 2 Error (Beta)

Incorrectly accepting a false null hypothesis, also known as a "false negative."

26
New cards

Power

The probability of detecting an effect if one is present, or the probability of avoiding a Type 2 error. 1-beta

27
New cards

Factors that affect power

  1. Alpha - increase in alpha, decreases beta and increases power

  2. Sample size - larger sample size increases power

  3. Variability - greater variability decreases power

  4. Magnitude of the effect of a variable - higher magnitude makes detection easier which increases the power

28
New cards

Difference b/n significant and meaningful

Significant - something happens for a reason not by chance. Determined by p value and alpha

Meaningful - the result actually matters. Determined by effect size, impact etc.

29
New cards

Confidence Interval

An estimated range of values that is likely to include the unknown population value.

30
New cards

Z statistic

x dash - population mean/ standard of error

Z test checks whether the population mean is equal to some specific value

If the z statistic > z critical value we reject. If not we fail to reject

31
New cards

SEM (Standard Error of the Mean)

Measure of variability between sample means calculated as SEM=σn\text{SEM} = \frac{\sigma}{\sqrt{n}}; it estimates how accurately the sample mean represents the population mean.

32
New cards

Effect Size

A measure of how different two groups are from one another that helps determine the "meaningfulness" of a result. Simple effect size = x1 dash - x2 dash/ S Complicated effect size = x1 dash - x2 dash/root variance 1 + variance 2/ 2. 0.0 - 0.2 effect size is small, 0.2 - 0.5 is medium, 0.5 and above is large

33
New cards

T test for independent groups

t statistic = x1 dash - x2 dash/root (1/n1 + 1/n2)((n1-1)s12 + (n2 - 1)s22 /n1+n2 -2

df= n1+n2 -2

This checks whether the population mean of group 1 = population mean of group 2

If t statistic > t critical value we reject and if the t statistic < t critical value we fail to reject

34
New cards

T test for dependent groups

t statistic = summation D/root nsummationD2 - summation (D)2 / n-1

df = n-1

This checks whether the average difference = 0

same rule as the independent groups to accept and reject

35
New cards

Degrees of Freedom (dfdf)

The number of values in the final calculation of a statistic that are free to vary; for an independent t-test, it is (N11)+(N21)(N_1 - 1) + (N_2 - 1), and for a one-way ANOVA, the numerator (k1)(k - 1) and denominator (Nk)(N - k).

36
New cards

Coefficient Correlation

Measures the strength and direction of the linear relationship between two variables. Ranges from -1 to 1 and a 0 coefficient correlation means no relationship. r = n(summationxy) - (summation x* summation y)/ root (n*summation x2 - summation (x)2 )(n*summation y2 - summation (y)2 ). 0.0-0.2 is very weak, 0.2-0.4 is weak, 0.4-0.6 is moderate, 0.6-0.8 strong, 0.8-1.0 very strong. x decreases y decreases (direct and positive), x increases and y increases (direct and positive), x decreases and y increases (indirect and negative) and x increases and y decreases (indirect and negative)

37
New cards

Coefficient of Determination (r2r^2)

The percentage of variance in one variable that is shared with or explained by the variance of another variable.

38
New cards

Coefficient of Non Determination

How much of a change in variable Y is not caused by variable X

39
New cards

Coeffiicent correlation test

  1. p = 0 and p not equal to 0

  2. The test statistic is either the coefficient correlation value or the t statistic

  3. Find t critical value and comparing

df = n - 2

40
New cards

Simple Linear Regression

The line of best fit that shows the linear relationship between two variables.

41
New cards

Independent variable

Variable that helps determine another variable (X)

42
New cards

Dependent Variable

Variable that is the outcome which depends on another variable for its values

43
New cards

Prediction error

Difference between actual and forecasted. Total prediction error should be the lowest at the line of best fit. summation (y actual - y predicted)2

44
New cards

Limits of R2

  1. Only focuses on the linear relationship and neglects all others

  2. Influenced by outliers

45
New cards

Analysis of Variance (ANOVA)

A statistical test used to determine if there are significant differences between the means of more than two groups.

46
New cards

F-statistic

The ratio of the variability between groups to the variability within groups, calculated as F=Mean Squares BetweenMean Squares Within\text{F} = \frac{\text{Mean Squares Between}}{\text{Mean Squares Within}}.

47
New cards

Factorial Analysis of Variance

A variation of ANOVA that explores more than one treatment factor and identifies main effects and interaction effects.

48
New cards

Nonparametric Tests

Statistical tests used when assumptions like normal distribution or homogeneity of variance are violated.

49
New cards

Chi-square Goodness-of-fit Test

A nonparametric test used to determine if the observed frequency of occurrences in categories matches what is expected by chance.

50
New cards

Linear Regression

A predictive tool that estimates a statistical relationship between a continuous independent variable (XX) and a continuous dependent variable (YY) using a line of best fit.

51
New cards

R-squared (R2R^2)

A measure of how much variation in the dependent variable is explained by the variation in the independent variable(s).

52
New cards

Multiple Regression

A regression model with more than one predictor (independent) variable to explain the outcome of a dependent variable.

53
New cards

Logistic Regression

A predictive tool used when the dependent variable is binary rather than continuous, utilizing maximum likelihood estimation (MLE).

54
New cards

Analysis File

A static version of the data that is fully cleaned and prepped before starting the formal analysis.

55
New cards

Appending

The process of stacking one data file on top of another to combine observations.

56
New cards

Merging

The process of combining files by adding variables to observations using a unique identifier.

57
New cards

Data Science

A field combining statistics, computer science, and programming used to extract information from big or unstructured data.

58
New cards

Natural Language Processing (NLP)

A data science technique focused on extracting the fuller meaning from free text, including grammar and parts of speech.

59
New cards

Pivot Tables

Excel tools used to summarize and analyze large data files through functions like Sum, Count, Average, Max, and Min.