Lecture Notes on Hypothesis Testing and Data Analysis

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/46

flashcard set

Earn XP

Description and Tags

Flashcards covering key vocabulary and concepts from lecture notes on statistics, hypothesis testing, and data analysis.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

47 Terms

1
New cards

Two-sided test

A situation where the null hypothesis is mu is 3, and the alternative is mu is different from 3, indicating a test for difference from a specific value.

2
New cards

P-value

The probability of obtaining a test statistic as extreme as, or more extreme than, the one actually observed, assuming the null hypothesis is true.

3
New cards

Standard Error

The standard deviation of the sample mean, calculated as the standard deviation divided by the square root of the sample size.

4
New cards

One-tailed test

A test where the alternative hypothesis specifies that the population parameter is either strictly greater than or strictly less than a certain value.

5
New cards

Null Hypothesis

The hypothesis that is tested against in hypothesis testing, often stating no effect or no difference.

6
New cards

Alternative Hypothesis

The hypothesis that contradicts the null hypothesis; it's what the researcher is trying to find evidence for.

7
New cards

T-test

A command used in statistical software to perform a t-test, which involves specifying the variable to be tested and the null hypothesis value.

8
New cards

Reshape command

The process of changing the structure of a dataset from a wide format (where each row represents a single observation with multiple variables) to a long format (where each row represents a single measurement of a variable for a particular observation).

9
New cards

White Form Data

A data format where each row represents a single observation with multiple variables.

10
New cards

Long Form Data

A data format where each row represents a single measurement of a variable for a particular observation.

11
New cards

i observation

Cross-sectional observation, often denoted as 'i' in data manipulation commands; represents individual entities.

12
New cards

j variable

A new variable created during data transformation, often denoted as 'j'; represents a specific attribute or time period.

13
New cards

Pie Chart

A graph representing categorical data, where the area of each slice is proportional to the frequency of the category.

14
New cards

Bar Graph

A graph representing categorical data, where the height of each bar is proportional to the frequency of the category.

15
New cards

Histogram

A graph representing quantitative data, where the data is grouped into bins, and the height of each bar represents the frequency of the bin.

16
New cards

Box and Whisker Plot

A graph representing quantitative data, where the box shows the interquartile range (IQR), the whiskers extend to the farthest data point within 1.5 times the IQR, and outliers are plotted as individual points.

17
New cards

Mean

A measure of the center of a dataset, calculated by adding all the observations and dividing by the number of observations.

18
New cards

Median

The middle value in an ordered sequence of data.

19
New cards

Mode

The most frequently occurring value in a dataset.

20
New cards

Mid-range

A measure of the center of a dataset, calculated by adding the largest and smallest values and dividing by two.

21
New cards

Skewness

The extent to which a distribution is not symmetric. It is determined by the long tail of the distribution.

22
New cards

Kurtosis

A measure of whether the data is heavy-tailed or light-tailed relative to a normal distribution.

23
New cards

Range

The difference between the largest and the smallest observations in a dataset.

24
New cards

Variance

A measure of dispersion, calculated as the average of the squared deviations from the mean.

25
New cards

Standard Deviation

A measure of dispersion, calculated as the square root of the variance.

26
New cards

Coefficient of Variation (CV)

A measure of relative variability, calculated as the standard deviation divided by the mean.

27
New cards

Interquartile Range (IQR)

The range between the 25th and 75th percentiles of a dataset; not susceptible to outliers.

28
New cards

Average Absolute Deviation

The sum of the absolute values of the differences between each observation and the mean, divided by the number of observations.

29
New cards

Chebyshev's Theorem

A theorem that provides a lower bound on the proportion of data within a given number of standard deviations from the mean; applies to any data.

30
New cards

Empirical Rule

A rule stating that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.

31
New cards

Percentiles

Values that divide a dataset into 100 equal parts.

32
New cards

Quartiles

Values that divide a dataset into four equal parts.

33
New cards

Z-score

A measure of how many standard deviations an observation is from the mean.

34
New cards

Outliers

Observations that are more than three standard deviations away from the mean.

35
New cards

Log Transformation

Process of transforming data by taking the logarithm. Common for percentage change and addressing skewness.

36
New cards

Rule of 72

An approximation calculated by dividing 72 by the interest rate, is used to calculate the number of years for an investment to double in value.

37
New cards

Gross Domestic Product (GDP)

Measures the monetary value of all finished goods and services made within a country during a specific period.

38
New cards

Gross National Product (GNP)

The total value of all final goods and services produced by a country's factors of production and sold on the market in a given time period.

39
New cards

Price Index

A normalized average of price relatives for a given class of goods or services in a given region, during a given interval of time.

40
New cards

Labor Force Participation Rate

The percentage of the civilian noninstitutional population that is in the labor force.

41
New cards

Stock Indices

Values or degree determined at a single point in time.

42
New cards

Real Data

Data adjusted to remove the effects of inflation.

43
New cards

Nominal Data

Data not adjusted for inflation.

44
New cards

Per Capita

Per person.

45
New cards

Sampling Distribution

The distribution of a statistic (like the sample mean) across multiple samples from the same population.

46
New cards

Central Limit Theorem (CLT)

States that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the distribution in the populations.

47
New cards

Confidence Interval

An interval that estimates the range within which a population parameter is likely to fall, with a certain level of confidence.