1/69
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Quantitative Data
Continuous & Discrete Data
Categorical Data
Ordinal, Nominal, Binary Data
Descriptive Statistics
Purpose: Summarize & describe data
Focus: Present data in a meaningful manner
Methods: Measures of central tendency, dispersion, frequency distributions
Examples: Calculating averages, creating charts
Inferential Statistics
Purpose: Make inferences about a population
Focus: Draw conclusions beyond the data
Methods: Hypothesis testing, confidence intervals, correlation & regression analysis
Examples: Determining significance, predicting future outcomes
Measures of Central Tendency
Mean, median, mode
Measures of Dispersion
Range, Standard Deviation (SD), Quartile Deviation (QD), Variance, Absolute Deviation (AD)
Summary
Measures of Central Tendency are statistical values that represent a typical or central value of a dataset. They provide a ______ of the data distribution.
Mean
Function: The arithmetic average of a dataset.
When to use: When the data is normally distributed (no significant skewness) and there are no outliers (extreme values).
Example: Calculating the average grade in a class.
Median
Function: The middle value in a dataset when the values are arranged in ascending order.
When to use: When the data is skewed (not normally distributed) or has outliers. The median is less affected by these factors.
Example: Determining the median income in a community.
Mode
Function: The most frequently occurring value in a dataset.
When to use: When identifying the most common category or value in a dataset.
Example: Finding the most popular color of car sold in a dealership.
Symmetrical
Mean is often the best choice when the distribution is ____.
Skewed
Median is often preferred (it is less affected by outliers) if the distribution is ____.
Nominal
Mode is the only appropriate measure if the data is _____.
Ordinal
Median or mode can be used (depending on the research question) when the data is _____.
Consistency
Measures of dispersion quantify how spread out or scattered data points are from a central value (like the mean or median). This helps us understand the variability ______ within a dataset.
Range
Function: Measures the difference between the largest and smallest values in a dataset.
When to use: When you need a quick and simple measure of the overall spread of data.
Example: To determine the range of temperatures in a city over a week, you would subtract the lowest temperature from the highest.
Standard Deviation
Function: Measures the average distance of each data point from the mean.
When to use: When you want a more precise and commonly used measure of dispersion, especially when dealing with normally distributed data.
Example: To assess the variability in test scores of a class, you would calculate the standard deviation.
Variance
Function: The square of the standard deviation.
When to use: Primarily used in statistical calculations and formulas. It's often a precursor to calculating standard deviation.
Example: In regression analysis, the variance of the residuals is used to assess the model's fit.
Absolute Deviation
Function: Measures the average absolute difference between each data point and the mean.
When to use: When you want a measure of dispersion that is less sensitive to outliers (extreme values) than standard deviation.
Example: To analyze the variability in income levels of a population, absolute deviation might be preferred if there are a few very high-income individuals.
Simple & quick
Using range is _____, but it is sensitive to outliers.
Normally distributed data
The SD is common & precise for ________, but sensitive to outliers.
Statistical calculations
The variance is primarily used in ______.
Less sensitive
The AD is ____ to outliers than the SD but is less commonly used.
MOD: Range
Use this to quickly get an idea of the overall height variation.
MOD: SD
Use this to analyze how much individual heights deviate from the average height.
MOD: AD
If you suspect there might be a few very tall or short students, this could provide a more robust measure of dispersion.
MOD: Variance
This is used for statistical calculations like hypothesis testing.
Hypothesis Testing
This is a statistical method used to determine whether a hypothesis is true or false based on sample data.
H0
The null hypothesis is expressed in this symbol
H1
The alternative hypothesis is expressed in this symbol
Z-Test
For Hypothesis Testing
When to use: When the sample size is large (typically n ≥ 30) and the population standard deviation is known.
Function: Compares a sample mean to a known population mean.
Example: Testing whether the average height of students in a school is significantly different from the national average.
T-test
For Hypothesis Testing
When to use: When the sample size is large (typically n ≥ 30) and the population standard deviation is unknown.
Function: Compares a sample mean to a known population mean or compares the means of two independent samples.
Example: Testing whether there is a significant difference in the average test scores of two different classes.
F-Test
For Hypothesis Testing
When to use: To compare the variances of two or more samples
Function: Determines if there is a significant difference in the variability of two or more groups.
Example: Testing whether the variability in the weights of apples from two different orchards is significantly different.
ANOVA
For Hypothesis Testing
When to use: To compare the means of three or more independent groups
Function: Determines if there is a significant difference in the means of multiple groups.
Example: Testing whether there is a significant difference in the average sales of a product in three different regions.
Wilcoxon Signed Rank Test
For Hypothesis Testing
When to use: When the data is paired and the assumptions of the t-test are not met (e.g., non-normality).
Function: Compares the medians of two paired samples.
Example: Testing whether there is a significant difference in the pre- and posttest scores of a group of students
Mann-Whitney U test
For Hypothesis Testing
When to use: When the data is independent and the assumptions of the t-test are not met (e.g., non-normality).
Function: Compares the medians of two independent samples.
Example: Testing whether there is a significant difference in the salaries of male and female employees in a company.
HT: Z & F-Test
These are used to compare means.
HT: F-Test
This is used to compare variances
HT: ANOVA
This is used to compare multiple means
HT: Wilcoxon & Mann-Whitney U
These are non-parametric alternatives to the t-test.
One or more independent
Regression analysis a statistical technique used to model the relationship between a dependent variable and ________ variables. It helps us understand how changes in the independent variables affect the dependent variable
Linear Regression
For Regression Analysis
Function: Models a linear relationship between the dependent and independent variables.
When to use: When you believe there's a straight-line relationship between the variables.
Example: Predicting house prices based on square footage and number of bedrooms.
Nominal Regression
For Regression Analysis
Function: Models a relationship between a categorical dependent variable and one or more independent variables.
When to use: When the dependent variable has mutually exclusive categories (e.g., male/female, yes/no).
Example: Predicting whether a customer will churn based on their demographics and usage patterns.
Logistic Regression
For Regression Analysis
Function: Models the probability of a binary outcome (0 or 1) based on one or more independent variables.
When to use: When the dependent variable is binary (e.g., success/failure, presence/absence).
Example: Predicting whether a loan applicant will default based on their credit score and income.
Ordinal Regression
For Regression Analysis
Function: Models a relationship between an ordinal dependent variable (with ordered categories) and one or more independent variables.
When to use: When the dependent variable has ordered categories (e.g., low/medium/high, strongly disagree/disagree/neutral/agree/strongly agree).
Example: Predicting a customer's satisfaction rating based on product features and price.
Regression Model
Use this when you have a dependent variable type
RA: Linear Regression
Use this when you have: Continuous data
RA: Nominal Regression
Use this when you have: Categorical (nominal) data
RA: Logistic Regression
Use this when you have: Binary data (0/1)
RA: Ordinal Regression
Use this when you have: Ordinal data
Parametric Tests
These are statistical tests that make assumptions about the underlying population distribution, typically assuming it follows a normal distribution. These assumptions allow for more powerful and precise analyses.
T-tests
Parametric Test (PT)
Function: Compares the means of two groups
When to use: When you have two independent groups. When the data is normally distributed or the sample size is large (n > 30).
Example: Comparing the average test scores of students who received a new teaching method versus those who received the traditional method
ANOVA
Parametric Test (PT)
Function: Compares the means of more than two groups.
When to use: When you have more than two independent groups. When the data is normally distributed or the sample size is large.
Example: Comparing the average sales of a product in three different regions.
Regression Analysis
Parametric Test (PT)
Function: Examines the relationship between a dependent variable and one or more independent variables
When to use: When you want to predict a dependent variable based on independent variables. When the data is normally distributed.
Example: Predicting house prices based on factors like square footage, number of bedrooms, and location.
Pearson Correlation Coefficient
Parametric Test (PT)
Function: Measures the strength and direction of the linear relationship between two variables.
When to use: When you want to assess the association between two continuous variables. When the data is normally distributed.
Example: Examining the relationship between education level and income.
Chi-Square Test
Parametric Test (PT)
Function: Tests for independence between categorical variables.
When to use: When you have categorical data. When you want to determine if there is a relationship between two or more categorical variables.
Example: Investigating if there is a relationship between gender and preference for a particular brand of car.
Normality Assumption
Parametric tests rely on ______. If the data does not appear as such, non-parametric tests might be more appropriate.
Homogeneity of Variance
Some parametric tests also assume that the variances of the groups being compared are equal.
Power
Parametric tests generally have more ____ (ability to detect differences) than non-parametric tests when the assumptions are met.
Non-parametric alternative tests
If the assumptions of the parametric test are violated, considering using ____.
Non-parametric tests (NPT)
_____ are statistical tests that do not assume a specific distribution for the data. This makes them versatile and applicable to a wider range of scenarios, especially when data is not normally distributed or when the underlying assumptions of parametric tests are violated.
Spearman Rank Correlation Coefficient
NPT under correlation tests
Measures the monotonic relationship between two variables.
Use when: When data is ordinal or when the relationship between variables is not linear.
Example: Assessing the relationship between shoe size and height.
Kendall’s Tau
NPT under correlation tests
Another measure of monotonic relationship, often used when there are ties in the data.
Use when: Similar to Spearman's rank correlation, but more robust to ties.
Mann-Whitney U
NPT under hypothesis testing
Compares the medians of two independent samples
Use when: When data is not normally distributed and the assumption of equal variances is violated.
Example: Comparing the test scores of two different teaching methods.
Wilcoxon Signed-Rank Test
NPT under hypothesis testing
Compares the medians of two dependent samples.
Use when: When data is paired or matched, and the assumption of normality is violated
Example: Comparing pre- and post-treatment scores for a group of patients.
Kruskal-Wallis Test
NPT under hypothesis testing
Compares the medians of three or more independent samples
Use when: When data is not normally distributed and the assumption of equal variances is violated.
Example: Comparing the sales performance of three different marketing campaigns
Friedman Test
NPT under hypothesis testing
Compares the medians of three or more dependent samples
Use when: When data is paired or matched
Example: Assessing the effectiveness of four different training programs on employee performance.
Chi-square Test of Independence
NPT under tests of association
Tests for independence between two categorical variables.
Use when: When both variables are categorical.
Example: Determining if there is a relationship between gender and smoking habits.
Fisher’s Exact Test
NPT under tests of association
An alternative to the chi-square test, especially when sample sizes are small.
Use when: Similar to the chi-square test, but more accurate for small samples.
Kolmogorov Smirnov Test
NPT under tests of association
Tests if a sample distribution fits a theoretical distribution.
Use when: To assess if data follows a specific distribution (e.g., normal, uniform).