1/46
Flashcards covering key vocabulary and concepts from lecture notes on statistics, hypothesis testing, and data analysis.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Two-sided test
A situation where the null hypothesis is mu is 3, and the alternative is mu is different from 3, indicating a test for difference from a specific value.
P-value
The probability of obtaining a test statistic as extreme as, or more extreme than, the one actually observed, assuming the null hypothesis is true.
Standard Error
The standard deviation of the sample mean, calculated as the standard deviation divided by the square root of the sample size.
One-tailed test
A test where the alternative hypothesis specifies that the population parameter is either strictly greater than or strictly less than a certain value.
Null Hypothesis
The hypothesis that is tested against in hypothesis testing, often stating no effect or no difference.
Alternative Hypothesis
The hypothesis that contradicts the null hypothesis; it's what the researcher is trying to find evidence for.
T-test
A command used in statistical software to perform a t-test, which involves specifying the variable to be tested and the null hypothesis value.
Reshape command
The process of changing the structure of a dataset from a wide format (where each row represents a single observation with multiple variables) to a long format (where each row represents a single measurement of a variable for a particular observation).
White Form Data
A data format where each row represents a single observation with multiple variables.
Long Form Data
A data format where each row represents a single measurement of a variable for a particular observation.
i observation
Cross-sectional observation, often denoted as 'i' in data manipulation commands; represents individual entities.
j variable
A new variable created during data transformation, often denoted as 'j'; represents a specific attribute or time period.
Pie Chart
A graph representing categorical data, where the area of each slice is proportional to the frequency of the category.
Bar Graph
A graph representing categorical data, where the height of each bar is proportional to the frequency of the category.
Histogram
A graph representing quantitative data, where the data is grouped into bins, and the height of each bar represents the frequency of the bin.
Box and Whisker Plot
A graph representing quantitative data, where the box shows the interquartile range (IQR), the whiskers extend to the farthest data point within 1.5 times the IQR, and outliers are plotted as individual points.
Mean
A measure of the center of a dataset, calculated by adding all the observations and dividing by the number of observations.
Median
The middle value in an ordered sequence of data.
Mode
The most frequently occurring value in a dataset.
Mid-range
A measure of the center of a dataset, calculated by adding the largest and smallest values and dividing by two.
Skewness
The extent to which a distribution is not symmetric. It is determined by the long tail of the distribution.
Kurtosis
A measure of whether the data is heavy-tailed or light-tailed relative to a normal distribution.
Range
The difference between the largest and the smallest observations in a dataset.
Variance
A measure of dispersion, calculated as the average of the squared deviations from the mean.
Standard Deviation
A measure of dispersion, calculated as the square root of the variance.
Coefficient of Variation (CV)
A measure of relative variability, calculated as the standard deviation divided by the mean.
Interquartile Range (IQR)
The range between the 25th and 75th percentiles of a dataset; not susceptible to outliers.
Average Absolute Deviation
The sum of the absolute values of the differences between each observation and the mean, divided by the number of observations.
Chebyshev's Theorem
A theorem that provides a lower bound on the proportion of data within a given number of standard deviations from the mean; applies to any data.
Empirical Rule
A rule stating that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
Percentiles
Values that divide a dataset into 100 equal parts.
Quartiles
Values that divide a dataset into four equal parts.
Z-score
A measure of how many standard deviations an observation is from the mean.
Outliers
Observations that are more than three standard deviations away from the mean.
Log Transformation
Process of transforming data by taking the logarithm. Common for percentage change and addressing skewness.
Rule of 72
An approximation calculated by dividing 72 by the interest rate, is used to calculate the number of years for an investment to double in value.
Gross Domestic Product (GDP)
Measures the monetary value of all finished goods and services made within a country during a specific period.
Gross National Product (GNP)
The total value of all final goods and services produced by a country's factors of production and sold on the market in a given time period.
Price Index
A normalized average of price relatives for a given class of goods or services in a given region, during a given interval of time.
Labor Force Participation Rate
The percentage of the civilian noninstitutional population that is in the labor force.
Stock Indices
Values or degree determined at a single point in time.
Real Data
Data adjusted to remove the effects of inflation.
Nominal Data
Data not adjusted for inflation.
Per Capita
Per person.
Sampling Distribution
The distribution of a statistic (like the sample mean) across multiple samples from the same population.
Central Limit Theorem (CLT)
States that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the distribution in the populations.
Confidence Interval
An interval that estimates the range within which a population parameter is likely to fall, with a certain level of confidence.