lesson 3 : data analysis [fmpsy]

0.0(0)
studied byStudied by 1 person
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/69

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

70 Terms

1
New cards

Quantitative Data

Continuous & Discrete Data

2
New cards

Categorical Data

Ordinal, Nominal, Binary Data

3
New cards

Descriptive Statistics

  • Purpose: Summarize & describe data

  • Focus: Present data in a meaningful manner

  • Methods: Measures of central tendency, dispersion, frequency distributions

  • Examples: Calculating averages, creating charts

4
New cards

Inferential Statistics

  • Purpose: Make inferences about a population

  • Focus: Draw conclusions beyond the data

  • Methods: Hypothesis testing, confidence intervals, correlation & regression analysis

  • Examples: Determining significance, predicting future outcomes

5
New cards

Measures of Central Tendency

Mean, median, mode

6
New cards

Measures of Dispersion

Range, Standard Deviation (SD), Quartile Deviation (QD), Variance, Absolute Deviation (AD)

7
New cards

Summary

Measures of Central Tendency are statistical values that represent a typical or central value of a dataset. They provide a ______ of the data distribution.

8
New cards

Mean

  • Function: The arithmetic average of a dataset.

  • When to use: When the data is normally distributed (no significant skewness) and there are no outliers (extreme values).

  • Example: Calculating the average grade in a class.

9
New cards

Median

  • Function: The middle value in a dataset when the values are arranged in ascending order.

  • When to use: When the data is skewed (not normally distributed) or has outliers. The median is less affected by these factors.

  • Example: Determining the median income in a community.

10
New cards

Mode

  • Function: The most frequently occurring value in a dataset.

  • When to use: When identifying the most common category or value in a dataset.

  • Example: Finding the most popular color of car sold in a dealership.

11
New cards

Symmetrical

Mean is often the best choice when the distribution is ____.

12
New cards

Skewed

Median is often preferred (it is less affected by outliers) if the distribution is ____.

13
New cards

Nominal

Mode is the only appropriate measure if the data is _____.

14
New cards

Ordinal

Median or mode can be used (depending on the research question) when the data is _____.

15
New cards

Consistency

Measures of dispersion quantify how spread out or scattered data points are from a central value (like the mean or median). This helps us understand the variability ______ within a dataset.

16
New cards

Range

  • Function: Measures the difference between the largest and smallest values in a dataset.

  • When to use: When you need a quick and simple measure of the overall spread of data.

  • Example: To determine the range of temperatures in a city over a week, you would subtract the lowest temperature from the highest.

17
New cards

Standard Deviation

  • Function: Measures the average distance of each data point from the mean.

  • When to use: When you want a more precise and commonly used measure of dispersion, especially when dealing with normally distributed data.

  • Example: To assess the variability in test scores of a class, you would calculate the standard deviation.

18
New cards

Variance

  • Function: The square of the standard deviation.

  • When to use: Primarily used in statistical calculations and formulas. It's often a precursor to calculating standard deviation.

  • Example: In regression analysis, the variance of the residuals is used to assess the model's fit.

19
New cards

Absolute Deviation

  • Function: Measures the average absolute difference between each data point and the mean.

  • When to use: When you want a measure of dispersion that is less sensitive to outliers (extreme values) than standard deviation.

  • Example: To analyze the variability in income levels of a population, absolute deviation might be preferred if there are a few very high-income individuals.

20
New cards

Simple & quick

Using range is _____, but it is sensitive to outliers.

21
New cards

Normally distributed data

The SD is common & precise for ________, but sensitive to outliers.

22
New cards

Statistical calculations

The variance is primarily used in ______.

23
New cards

Less sensitive

The AD is ____ to outliers than the SD but is less commonly used.

24
New cards

MOD: Range

Use this to quickly get an idea of the overall height variation.

25
New cards

MOD: SD

Use this to analyze how much individual heights deviate from the average height.

26
New cards

MOD: AD

If you suspect there might be a few very tall or short students, this could provide a more robust measure of dispersion.

27
New cards

MOD: Variance

This is used for statistical calculations like hypothesis testing.

28
New cards

Hypothesis Testing

This is a statistical method used to determine whether a hypothesis is true or false based on sample data.

29
New cards

H0

The null hypothesis is expressed in this symbol

30
New cards

H1

The alternative hypothesis is expressed in this symbol

31
New cards

Z-Test

  • For Hypothesis Testing

  • When to use: When the sample size is large (typically n ≥ 30) and the population standard deviation is known.

  • Function: Compares a sample mean to a known population mean.

  • Example: Testing whether the average height of students in a school is significantly different from the national average.

32
New cards

T-test

  • For Hypothesis Testing

  • When to use: When the sample size is large (typically n ≥ 30) and the population standard deviation is unknown.

  • Function: Compares a sample mean to a known population mean or compares the means of two independent samples.

  • Example: Testing whether there is a significant difference in the average test scores of two different classes.

33
New cards

F-Test

  • For Hypothesis Testing

  • When to use: To compare the variances of two or more samples

  • Function: Determines if there is a significant difference in the variability of two or more groups.

  • Example: Testing whether the variability in the weights of apples from two different orchards is significantly different.

34
New cards

ANOVA

  • For Hypothesis Testing

  • When to use: To compare the means of three or more independent groups

  • Function: Determines if there is a significant difference in the means of multiple groups.

  • Example: Testing whether there is a significant difference in the average sales of a product in three different regions.

35
New cards

Wilcoxon Signed Rank Test

  • For Hypothesis Testing

  • When to use: When the data is paired and the assumptions of the t-test are not met (e.g., non-normality).

  • Function: Compares the medians of two paired samples.

  • Example: Testing whether there is a significant difference in the pre- and posttest scores of a group of students

36
New cards

Mann-Whitney U test

  • For Hypothesis Testing

  • When to use: When the data is independent and the assumptions of the t-test are not met (e.g., non-normality).

  • Function: Compares the medians of two independent samples.

  • Example: Testing whether there is a significant difference in the salaries of male and female employees in a company.

37
New cards

HT: Z & F-Test

These are used to compare means.

38
New cards

HT: F-Test

This is used to compare variances

39
New cards

HT: ANOVA

This is used to compare multiple means

40
New cards

HT: Wilcoxon & Mann-Whitney U

These are non-parametric alternatives to the t-test.

41
New cards

One or more independent

Regression analysis a statistical technique used to model the relationship between a dependent variable and ________ variables. It helps us understand how changes in the independent variables affect the dependent variable

42
New cards

Linear Regression

  • For Regression Analysis

  • Function: Models a linear relationship between the dependent and independent variables.

  • When to use: When you believe there's a straight-line relationship between the variables.

  • Example: Predicting house prices based on square footage and number of bedrooms.

43
New cards

Nominal Regression

  • For Regression Analysis

  • Function: Models a relationship between a categorical dependent variable and one or more independent variables.

  • When to use: When the dependent variable has mutually exclusive categories (e.g., male/female, yes/no).

  • Example: Predicting whether a customer will churn based on their demographics and usage patterns.

44
New cards

Logistic Regression

  • For Regression Analysis

  • Function: Models the probability of a binary outcome (0 or 1) based on one or more independent variables.

  • When to use: When the dependent variable is binary (e.g., success/failure, presence/absence).

  • Example: Predicting whether a loan applicant will default based on their credit score and income.

45
New cards

Ordinal Regression

  • For Regression Analysis

  • Function: Models a relationship between an ordinal dependent variable (with ordered categories) and one or more independent variables.

  • When to use: When the dependent variable has ordered categories (e.g., low/medium/high, strongly disagree/disagree/neutral/agree/strongly agree).

  • Example: Predicting a customer's satisfaction rating based on product features and price.

46
New cards

Regression Model

Use this when you have a dependent variable type

47
New cards

RA: Linear Regression

Use this when you have: Continuous data

48
New cards

RA: Nominal Regression

Use this when you have: Categorical (nominal) data

49
New cards

RA: Logistic Regression

Use this when you have: Binary data (0/1)

50
New cards

RA: Ordinal Regression

Use this when you have: Ordinal data

51
New cards

Parametric Tests

These are statistical tests that make assumptions about the underlying population distribution, typically assuming it follows a normal distribution. These assumptions allow for more powerful and precise analyses.

52
New cards

T-tests

  • Parametric Test (PT)

  • Function: Compares the means of two groups

  • When to use: When you have two independent groups. When the data is normally distributed or the sample size is large (n > 30).

  • Example: Comparing the average test scores of students who received a new teaching method versus those who received the traditional method

53
New cards

ANOVA

  • Parametric Test (PT)

  • Function: Compares the means of more than two groups.

  • When to use: When you have more than two independent groups. When the data is normally distributed or the sample size is large.

  • Example: Comparing the average sales of a product in three different regions.

54
New cards

Regression Analysis

  • Parametric Test (PT)

  • Function: Examines the relationship between a dependent variable and one or more independent variables

  • When to use: When you want to predict a dependent variable based on independent variables. When the data is normally distributed.

  • Example: Predicting house prices based on factors like square footage, number of bedrooms, and location.

55
New cards

Pearson Correlation Coefficient

  • Parametric Test (PT)

  • Function: Measures the strength and direction of the linear relationship between two variables.

  • When to use: When you want to assess the association between two continuous variables. When the data is normally distributed.

  • Example: Examining the relationship between education level and income.

56
New cards

Chi-Square Test

  • Parametric Test (PT)

  • Function: Tests for independence between categorical variables.

  • When to use: When you have categorical data. When you want to determine if there is a relationship between two or more categorical variables.

  • Example: Investigating if there is a relationship between gender and preference for a particular brand of car.

57
New cards

Normality Assumption

Parametric tests rely on ______. If the data does not appear as such, non-parametric tests might be more appropriate.

58
New cards

Homogeneity of Variance

Some parametric tests also assume that the variances of the groups being compared are equal.

59
New cards

Power

Parametric tests generally have more ____ (ability to detect differences) than non-parametric tests when the assumptions are met.

60
New cards

Non-parametric alternative tests

If the assumptions of the parametric test are violated, considering using ____.

61
New cards

Non-parametric tests (NPT)

_____ are statistical tests that do not assume a specific distribution for the data. This makes them versatile and applicable to a wider range of scenarios, especially when data is not normally distributed or when the underlying assumptions of parametric tests are violated.

62
New cards

Spearman Rank Correlation Coefficient

  • NPT under correlation tests

  • Measures the monotonic relationship between two variables.

  • Use when: When data is ordinal or when the relationship between variables is not linear.

  • Example: Assessing the relationship between shoe size and height.

63
New cards

Kendall’s Tau

  • NPT under correlation tests

  • Another measure of monotonic relationship, often used when there are ties in the data.

  • Use when: Similar to Spearman's rank correlation, but more robust to ties.

64
New cards

Mann-Whitney U

  • NPT under hypothesis testing

  • Compares the medians of two independent samples

  • Use when: When data is not normally distributed and the assumption of equal variances is violated.

  • Example: Comparing the test scores of two different teaching methods.

65
New cards

Wilcoxon Signed-Rank Test

  • NPT under hypothesis testing

  • Compares the medians of two dependent samples.

  • Use when: When data is paired or matched, and the assumption of normality is violated

  • Example: Comparing pre- and post-treatment scores for a group of patients.

66
New cards

Kruskal-Wallis Test

  • NPT under hypothesis testing

  • Compares the medians of three or more independent samples

  • Use when: When data is not normally distributed and the assumption of equal variances is violated.

  • Example: Comparing the sales performance of three different marketing campaigns

67
New cards

Friedman Test

  • NPT under hypothesis testing

  • Compares the medians of three or more dependent samples

  • Use when: When data is paired or matched

  • Example: Assessing the effectiveness of four different training programs on employee performance.

68
New cards

Chi-square Test of Independence

  • NPT under tests of association

  • Tests for independence between two categorical variables.

  • Use when: When both variables are categorical.

  • Example: Determining if there is a relationship between gender and smoking habits.

69
New cards

Fisher’s Exact Test

  • NPT under tests of association

  • An alternative to the chi-square test, especially when sample sizes are small.

  • Use when: Similar to the chi-square test, but more accurate for small samples.

70
New cards

Kolmogorov Smirnov Test

  • NPT under tests of association

  • Tests if a sample distribution fits a theoretical distribution.

  • Use when: To assess if data follows a specific distribution (e.g., normal, uniform).