lesson 3 : data analysis [fmpsy]

0.0(0)

Studied by 1 person

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/69

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

70 Terms

New cards

Quantitative Data

Continuous & Discrete Data

New cards

Categorical Data

Ordinal, Nominal, Binary Data

New cards

Descriptive Statistics

Purpose: Summarize & describe data
Focus: Present data in a meaningful manner
Methods: Measures of central tendency, dispersion, frequency distributions
Examples: Calculating averages, creating charts

New cards

Inferential Statistics

Purpose: Make inferences about a population
Focus: Draw conclusions beyond the data
Methods: Hypothesis testing, confidence intervals, correlation & regression analysis
Examples: Determining significance, predicting future outcomes

New cards

Measures of Central Tendency

Mean, median, mode

New cards

Measures of Dispersion

Range, Standard Deviation (SD), Quartile Deviation (QD), Variance, Absolute Deviation (AD)

New cards

Summary

Measures of Central Tendency are statistical values that represent a typical or central value of a dataset. They provide a ______ of the data distribution.

New cards

Mean

Function: The arithmetic average of a dataset.
When to use: When the data is normally distributed (no significant skewness) and there are no outliers (extreme values).
Example: Calculating the average grade in a class.

New cards

Median

Function: The middle value in a dataset when the values are arranged in ascending order.
When to use: When the data is skewed (not normally distributed) or has outliers. The median is less affected by these factors.
Example: Determining the median income in a community.

New cards

Mode

Function: The most frequently occurring value in a dataset.
When to use: When identifying the most common category or value in a dataset.
Example: Finding the most popular color of car sold in a dealership.

New cards

Symmetrical

Mean is often the best choice when the distribution is ____.

New cards

Skewed

Median is often preferred (it is less affected by outliers) if the distribution is ____.

New cards

Nominal

Mode is the only appropriate measure if the data is _____.

New cards

Ordinal

Median or mode can be used (depending on the research question) when the data is _____.

New cards

Consistency

Measures of dispersion quantify how spread out or scattered data points are from a central value (like the mean or median). This helps us understand the variability ______ within a dataset.

New cards

Range

Function: Measures the difference between the largest and smallest values in a dataset.
When to use: When you need a quick and simple measure of the overall spread of data.
Example: To determine the range of temperatures in a city over a week, you would subtract the lowest temperature from the highest.

New cards

Standard Deviation

Function: Measures the average distance of each data point from the mean.
When to use: When you want a more precise and commonly used measure of dispersion, especially when dealing with normally distributed data.
Example: To assess the variability in test scores of a class, you would calculate the standard deviation.

New cards

Variance

Function: The square of the standard deviation.
When to use: Primarily used in statistical calculations and formulas. It's often a precursor to calculating standard deviation.
Example: In regression analysis, the variance of the residuals is used to assess the model's fit.

New cards

Absolute Deviation

Function: Measures the average absolute difference between each data point and the mean.
When to use: When you want a measure of dispersion that is less sensitive to outliers (extreme values) than standard deviation.
Example: To analyze the variability in income levels of a population, absolute deviation might be preferred if there are a few very high-income individuals.

New cards

Simple & quick

Using range is _____, but it is sensitive to outliers.

New cards

Normally distributed data

The SD is common & precise for ________, but sensitive to outliers.

New cards

Statistical calculations

The variance is primarily used in ______.

New cards

Less sensitive

The AD is ____ to outliers than the SD but is less commonly used.

New cards

MOD: Range

Use this to quickly get an idea of the overall height variation.

New cards

MOD: SD

Use this to analyze how much individual heights deviate from the average height.

New cards

MOD: AD

If you suspect there might be a few very tall or short students, this could provide a more robust measure of dispersion.

New cards

MOD: Variance

This is used for statistical calculations like hypothesis testing.

New cards

Hypothesis Testing

This is a statistical method used to determine whether a hypothesis is true or false based on sample data.

New cards

The null hypothesis is expressed in this symbol

New cards

The alternative hypothesis is expressed in this symbol

New cards

Z-Test

For Hypothesis Testing
When to use: When the sample size is large (typically n ≥ 30) and the population standard deviation is known.
Function: Compares a sample mean to a known population mean.
Example: Testing whether the average height of students in a school is significantly different from the national average.

New cards

T-test

For Hypothesis Testing
When to use: When the sample size is large (typically n ≥ 30) and the population standard deviation is unknown.
Function: Compares a sample mean to a known population mean or compares the means of two independent samples.
Example: Testing whether there is a significant difference in the average test scores of two different classes.

New cards

F-Test

For Hypothesis Testing
When to use: To compare the variances of two or more samples
Function: Determines if there is a significant difference in the variability of two or more groups.
Example: Testing whether the variability in the weights of apples from two different orchards is significantly different.

New cards

ANOVA

For Hypothesis Testing
When to use: To compare the means of three or more independent groups
Function: Determines if there is a significant difference in the means of multiple groups.
Example: Testing whether there is a significant difference in the average sales of a product in three different regions.

New cards

Wilcoxon Signed Rank Test

For Hypothesis Testing
When to use: When the data is paired and the assumptions of the t-test are not met (e.g., non-normality).
Function: Compares the medians of two paired samples.
Example: Testing whether there is a significant difference in the pre- and posttest scores of a group of students

New cards

Mann-Whitney U test

For Hypothesis Testing
When to use: When the data is independent and the assumptions of the t-test are not met (e.g., non-normality).
Function: Compares the medians of two independent samples.
Example: Testing whether there is a significant difference in the salaries of male and female employees in a company.

New cards

HT: Z & F-Test

These are used to compare means.

New cards

HT: F-Test

This is used to compare variances

New cards

HT: ANOVA

This is used to compare multiple means

New cards

HT: Wilcoxon & Mann-Whitney U

These are non-parametric alternatives to the t-test.

New cards

One or more independent

Regression analysis a statistical technique used to model the relationship between a dependent variable and ________ variables. It helps us understand how changes in the independent variables affect the dependent variable

New cards

Linear Regression

For Regression Analysis
Function: Models a linear relationship between the dependent and independent variables.
When to use: When you believe there's a straight-line relationship between the variables.
Example: Predicting house prices based on square footage and number of bedrooms.

New cards

Nominal Regression

For Regression Analysis
Function: Models a relationship between a categorical dependent variable and one or more independent variables.
When to use: When the dependent variable has mutually exclusive categories (e.g., male/female, yes/no).
Example: Predicting whether a customer will churn based on their demographics and usage patterns.

New cards

Logistic Regression

For Regression Analysis
Function: Models the probability of a binary outcome (0 or 1) based on one or more independent variables.
When to use: When the dependent variable is binary (e.g., success/failure, presence/absence).
Example: Predicting whether a loan applicant will default based on their credit score and income.

New cards

Ordinal Regression

For Regression Analysis
Function: Models a relationship between an ordinal dependent variable (with ordered categories) and one or more independent variables.
When to use: When the dependent variable has ordered categories (e.g., low/medium/high, strongly disagree/disagree/neutral/agree/strongly agree).
Example: Predicting a customer's satisfaction rating based on product features and price.

New cards

Regression Model

Use this when you have a dependent variable type

New cards

RA: Linear Regression

Use this when you have: Continuous data

New cards

RA: Nominal Regression

Use this when you have: Categorical (nominal) data

New cards

RA: Logistic Regression

Use this when you have: Binary data (0/1)

New cards

RA: Ordinal Regression

Use this when you have: Ordinal data

New cards

Parametric Tests

These are statistical tests that make assumptions about the underlying population distribution, typically assuming it follows a normal distribution. These assumptions allow for more powerful and precise analyses.

New cards

T-tests

Parametric Test (PT)
Function: Compares the means of two groups
When to use: When you have two independent groups. When the data is normally distributed or the sample size is large (n > 30).
Example: Comparing the average test scores of students who received a new teaching method versus those who received the traditional method

New cards

ANOVA

Parametric Test (PT)
Function: Compares the means of more than two groups.
When to use: When you have more than two independent groups. When the data is normally distributed or the sample size is large.
Example: Comparing the average sales of a product in three different regions.

New cards

Regression Analysis

Parametric Test (PT)
Function: Examines the relationship between a dependent variable and one or more independent variables
When to use: When you want to predict a dependent variable based on independent variables. When the data is normally distributed.
Example: Predicting house prices based on factors like square footage, number of bedrooms, and location.

New cards

Pearson Correlation Coefficient

Parametric Test (PT)
Function: Measures the strength and direction of the linear relationship between two variables.
When to use: When you want to assess the association between two continuous variables. When the data is normally distributed.
Example: Examining the relationship between education level and income.

New cards

Chi-Square Test

Parametric Test (PT)
Function: Tests for independence between categorical variables.
When to use: When you have categorical data. When you want to determine if there is a relationship between two or more categorical variables.
Example: Investigating if there is a relationship between gender and preference for a particular brand of car.

New cards

Normality Assumption

Parametric tests rely on ______. If the data does not appear as such, non-parametric tests might be more appropriate.

New cards

Homogeneity of Variance

Some parametric tests also assume that the variances of the groups being compared are equal.

New cards

Power

Parametric tests generally have more ____ (ability to detect differences) than non-parametric tests when the assumptions are met.

New cards

Non-parametric alternative tests

If the assumptions of the parametric test are violated, considering using ____.

New cards

Non-parametric tests (NPT)

_____ are statistical tests that do not assume a specific distribution for the data. This makes them versatile and applicable to a wider range of scenarios, especially when data is not normally distributed or when the underlying assumptions of parametric tests are violated.

New cards

Spearman Rank Correlation Coefficient

NPT under correlation tests
Measures the monotonic relationship between two variables.
Use when: When data is ordinal or when the relationship between variables is not linear.
Example: Assessing the relationship between shoe size and height.

New cards

Kendall’s Tau

NPT under correlation tests
Another measure of monotonic relationship, often used when there are ties in the data.
Use when: Similar to Spearman's rank correlation, but more robust to ties.

New cards

Mann-Whitney U

NPT under hypothesis testing
Compares the medians of two independent samples
Use when: When data is not normally distributed and the assumption of equal variances is violated.
Example: Comparing the test scores of two different teaching methods.

New cards

Wilcoxon Signed-Rank Test

NPT under hypothesis testing
Compares the medians of two dependent samples.
Use when: When data is paired or matched, and the assumption of normality is violated
Example: Comparing pre- and post-treatment scores for a group of patients.

New cards

Kruskal-Wallis Test

NPT under hypothesis testing
Compares the medians of three or more independent samples
Use when: When data is not normally distributed and the assumption of equal variances is violated.
Example: Comparing the sales performance of three different marketing campaigns

New cards

Friedman Test

NPT under hypothesis testing
Compares the medians of three or more dependent samples
Use when: When data is paired or matched
Example: Assessing the effectiveness of four different training programs on employee performance.

New cards

Chi-square Test of Independence

NPT under tests of association
Tests for independence between two categorical variables.
Use when: When both variables are categorical.
Example: Determining if there is a relationship between gender and smoking habits.

New cards

Fisher’s Exact Test

NPT under tests of association
An alternative to the chi-square test, especially when sample sizes are small.
Use when: Similar to the chi-square test, but more accurate for small samples.

New cards

Kolmogorov Smirnov Test

NPT under tests of association
Tests if a sample distribution fits a theoretical distribution.
Use when: To assess if data follows a specific distribution (e.g., normal, uniform).