Module 3: Bivariate Data

0.0(0)
studied byStudied by 8 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/30

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

31 Terms

1
New cards

What is bivariate data?

  • 2 variables measured on the same experimental unit that come in data pairs (1 pair/experimental unit)

  • Data pair is collected independently without bias

2
New cards

How can bivariate data occur? (3 different ways)

  • 2 qualitative variables (ex: gender and major of college students)

  • 1 qualitative and 1 quantitative (ex: gender and height)

  • 2 quantitative variables (ex: height and shoe size)

3
New cards

How is data involving 2 qualitative variables showcased?

  • Involves the usage of contingency table or side by side bar/circle graphs

    • Ex: survey on hair color and type

<ul><li><p>Involves the usage of contingency table or side by side bar/circle graphs</p><ul><li><p>Ex: survey on hair color and type</p></li></ul></li></ul>
4
New cards
<p>How is data involving 1 quantitative and 1 qualitative variable showcased?</p><p>Ex: region vs electric bill</p>

How is data involving 1 quantitative and 1 qualitative variable showcased?

Ex: region vs electric bill

Involves the use of side by side box and whisker graphs to display data

  • Shows measures of center and spread for each qualitative outcome

Multiple stem and leaf plots or multiple frequency histograms can also be used

<p>Involves the use of side by side box and whisker graphs to display data</p><ul><li><p>Shows measures of center and spread for each qualitative outcome</p></li></ul><p>Multiple stem and leaf plots or multiple frequency histograms can also be used</p>
5
New cards
<p>How is data involving 2 quantitative variables showcased?</p><p>Ex: Manatee deaths and # of boats per year</p>

How is data involving 2 quantitative variables showcased?

Ex: Manatee deaths and # of boats per year

  • Involves a scatter plot with data points

    • A data point represents two values (x and y variables)

<ul><li><p>Involves a scatter plot with data points</p><ul><li><p>A data point represents two values (x and y variables)</p></li></ul></li></ul>
6
New cards

How are 2 quantitative variables analyzed?

Using Correlation and Regression

7
New cards

Linear Correlation

  • Requires a linear relationship between 2 quantitative variables

    • ONLY used if there is a linear relationship to analyze

    • Variables are interchangeable so if they are switched then the r will be constant

<ul><li><p>Requires a linear relationship between 2 quantitative variables</p><ul><li><p>ONLY used if there is a linear relationship to analyze</p></li><li><p>Variables are interchangeable so if they are switched then the r will be constant</p></li></ul></li></ul>
8
New cards

Correlation Coefficient

  • Sample statistic coefficient is r

    • Measures the direction and strength of a linear relationship between 2 variables

    • Values vary from -1 to +1

9
New cards

Linear Correlation Direction: No correlation

  • Can be a scatter, r=0

  • Indicates that there is no linear relationship between x and y

<ul><li><p>Can be a scatter, r=0</p></li><li><p>Indicates that there is no linear relationship between x and y</p></li></ul>
10
New cards

Linear Correlation Direction: Positive Correlation

  • When y variable increases, the x variable also increases

  • r is a positive value that is greater than 0 and less than 1

<ul><li><p>When y variable increases, the x variable also increases</p></li><li><p>r is a positive value that is greater than 0 and less than 1</p></li></ul>
11
New cards

Linear Correlation Direction: Negative Correlation

  • When the y variable decreases, the x variable increases

  • r is a negative value that is greater than -1 and less than 0

<ul><li><p>When the y variable decreases, the x variable increases</p></li><li><p>r is a negative value that is greater than -1 and less than 0</p></li></ul>
12
New cards

Linear Correlation Strength : r= +1

  • Perfect positive correlation

<ul><li><p>Perfect positive correlation</p></li></ul>
13
New cards

Linear Correlation Strength: r= -1

  • Perfect negative correlation

<ul><li><p>Perfect negative correlation</p></li></ul>
14
New cards

Linear Correlation Strength: r value between -1 and 0 and +1

  • Intermediate Relationship

  • Typically values from -0.3 and 0 and +0.3 represents a weak or nonexistent relationship between the variables

<ul><li><p>Intermediate Relationship</p></li><li><p>Typically values from -0.3 and 0 and +0.3 represents a weak or nonexistent relationship between the variables</p></li></ul><p></p>
15
New cards

Example correlation problem : Researchers at Johns Hopkins recently discovered a correlation between dietary fiber in the diet and colon health. People on a high fiber diet have a lower risk of cancer. Is this a negative or positive correlation?

  • Negative

    • The x value is increasing while the y value is decreasing

16
New cards

How can r=0 occur? (3 ways)

A) no trend in data (scattered data plots)

B) as the x variable changes and y stays the same (horizontal line) or the y variable changes and the x stays the same (vertical line)

C) There is a relationship but it is not linear (ex: pyramid line)

  • A relationship can be missed if it is nonlinear so it should not be analyzed using correlation

17
New cards

Correlation Concerns: Part One

A) Check for nonlinear relationships

  • If there is a nonlinear pattern, the data may be able to fit a nonlinear model then be analyzed using correlation afterwards

B) Check for Outliers

  • Can move r closer to +1 or 0

  • Need a justification to remove valid data from a dataset

C) Correlation is not causation

  • Most correlations are done on survey data which cannot determine cause and effect

18
New cards

Correlation Problems: Part 2

D) Third Variable Problem

  • A factor that is not evaluated but does impact 2 variables that are measured

F) Do not extrapolate beyond data set

  • Do not draw conclusions based on information outside of recorded data

19
New cards

Correct Terminology for Correlation

  • Correlation explains how X and Y are associated

  • Terms related with correlation:

    • Tends to

    • Linked

    • Connected

    • Tied to

    • Associated

20
New cards

Reporting Correlations

Graphs should include:

  • Title

  • Clearly labeled axes

  • No Regression Line

  • Statistical Results

21
New cards

Difference between Correlation and Regression

Correlation: asks if 2 variables vary together

Regression: asks if changes in one variable causes or predicts changes in another variable

22
New cards

Linear Regression

  • Predicts a value for y (output/dependent variable) given an x value (input/independent variable)

      • Ex: mouse growth (gm) is dependent on amount of food given (gm)

  • Minimizes the variability in the y direction and generates the best fit regression line

<ul><li><p>Predicts a value for y (output/dependent variable) given an x value (input/independent variable)</p><ul><li><p></p><ul><li><p>Ex: mouse growth (gm) is dependent on amount of food given (gm)</p></li></ul></li></ul></li><li><p>Minimizes the variability in the y direction and generates the best fit regression line</p><p></p></li></ul>
23
New cards

Regression: Best Fit Line

  • Determines the Best Fit Line to the data

    • Minimizes deviations between line and actual data points vertically

    • Equation: Y= b0 + b1x

      • Estimate of line slope= b1

      • Estimate of the y intercept= b0

<ul><li><p>Determines the Best Fit Line to the data</p><ul><li><p>Minimizes deviations between line and actual data points vertically </p></li><li><p>Equation: Y= b0 + b1x</p><ul><li><p>Estimate of line slope= b1</p></li><li><p>Estimate of the y intercept= b0</p></li></ul></li></ul></li></ul>
24
New cards

Coefficient of Determination: R2

  • Measures the amount of variability in the dependent variable (y) explained by the variability in the independent variable (x)

    • Determines how tight the values are towards the regression line

  • Varies from 0 to 1

    • R2=0, no relationship between x and y

    • R2=1, perfect relationship (ex: straight line for linear regression)

<ul><li><p>Measures the amount of variability in the dependent variable (y) explained by the variability in the independent variable (x)</p><ul><li><p>Determines how tight the values are towards the regression line</p></li></ul></li><li><p>Varies from 0 to 1</p><ul><li><p>R2=0, no relationship between x and y</p></li><li><p>R2=1, perfect relationship (ex: straight line for linear regression)</p><p></p></li></ul></li></ul>
25
New cards

R2 vs r.

  • r is used for Correlation, gives tightness and direction, values vary from -1 to 0 to +1, can statistically test for relationship between x and y variables

  • R2 used for Regression, tells tightness (how close points are to the regression line) only, values vary from 0 to 1, not tested statistically

26
New cards

How to test for a relationship for regression?

  • Ask if the slope of the line (b1) is statistically significantly different from 0

  • Ask if the intercept (b0) is significantly different from zero

    • Regression will start at a number far from zero because of intercept

<ul><li><p>Ask if the slope of the line (b1) is statistically significantly different from 0</p></li><li><p>Ask if the intercept (b0) is significantly different from zero</p><ul><li><p>Regression will start at a number far from zero because of intercept</p></li></ul></li></ul>
27
New cards

Components for presenting Regression results

  • Title

  • Axes labeled units

  • Line only in the range of data

  • Equation for line

  • R2 line

28
New cards

Regression Concerns

  • Outliers can have a large impact

  • Never extrapolate beyond range of data

  • Relationship may be nonlinear, graph data first

  • Lurking variables if it is survey data

  • Open to different interpretations

29
New cards

Is it correlation or is it regression?

  • Correlation looks at the trend of 2 variables while regression asks if the y variable is a function of the x variable

    • For regression, interpretations depend on if a study is a survey or experiement

30
New cards

Regression: Causality

  • Can only be shown with controlled experiments

    • Ex: experiment with 4 levels of fertilizer (0,1,2, and 3 mg/m). 6 plants assigned to each fertilizer treatment, look at how tall they grow in 2 weeks

<ul><li><p>Can only be shown with controlled experiments</p><ul><li><p>Ex: experiment with 4 levels of fertilizer (0,1,2, and 3 mg/m). 6 plants assigned to each fertilizer treatment, look at how tall they grow in 2 weeks</p></li></ul></li></ul>
31
New cards

Regression vs Correlation

  • Correlation can only be used to compare 2 quantitative variables, can only look at the linear relationship typically using survey data

  • Regression can include more variables, can deal with curvilinear data, is able to deal with survey and experiment data (causation requires controlled experiment)

    • Can be used for nonlinear relationships