Quantitative Data Analysis Notes

Quantitative Data Analysis

Introduction

After data collection, researchers organize and analyze data to clarify study results.
Data analysis considerations:
- Type of design
- Type of data collected
- Hypothesis and/or research question
Statistics are extensively used in nursing and health research.

Quantitative Data

Descriptive Statistics: Summary statistics that organize data to give meaning and facilitate insight.
- Example: describing the sample (age, education level, gender).
Inferential Statistics: Statistics that allow inference from a sample statistic to a population parameter.
- Researchers estimate how reliably they can make predictions and generalize findings.

Statistics: Levels of Measurement

Four levels of data in statistics
Levels of measurement: Assignment of numbers to variables based on statistical rules.
- Nominal
- Ordinal
- Interval
- Ratio

Levels of Measurement

Nominal
- Classified in mutually exclusive categories.
- No ranking within categories.
- Example:
  - Gender
  - Marital status
  - Religious affiliation
  - Ethnicity
Ordinal
- Data are mutually exclusive and exhaustive and sorted on relative ranking of variables.
- Example: Education level
  - High school graduate
  - College certificate
  - Bachelor's degree
  - Master’s degree

Levels of Measurement (Continued)

Interval
- Mutually exclusive, exhaustive categories with ranking order, and equal distances between intervals.
- No absolute zero point.
- Example: Temperature
  - 20°C – 24.9°C
  - 25.0 – 29.9°C
Ratio
- Highest level of measurement.
- Mutually exclusive, exhaustive categories with ranking order, equal spacing between intervals, and a continuum of values.
- Examples: Weight, length, and volume.
- Absolute zero exists (absence of weight).

Types of Analysis

Frequency Distribution: Number of times each event occurs is counted; data grouped into categories.
- The frequency of each group reported.
- Sample: Scores of 9 students released by Dr. D: 14, 14, 15, 15, 16, 17, 17, 17, 20
- Frequency distribution by marks:
  - Score | Grade
  - ----- | -----
  - 14 | 2
  - 15 | 2
  - 16 | 1
  - 17 | 3
  - 20 | 1

Measures of Central Tendency

Mean: Average calculated by summing values and dividing by the # of values.
- Example: $(14 + 14 + 15 + 15 + 16 + 17 + 17 + 17 + 20 = 145 / 9 = 16.11)$
Median: Midpoint in a set of values (50% of distribution falls below, 50% above).
- Example: 16
Mode: Most frequently occurring score in the distribution.
- Example: 17

Normal Distribution

A theoretical concept where interval or ratio data group themselves about a midpoint, closely approximating the normal curve.
Mean, median, and mode are equal.

Distribution Curves

Positive Skew (Right-Sided Skew): Mean is usually greater than the median.
Negative Skew (Left-Sided Skew): Mean is usually less than the median.

Types of Data Analysis

Range: Difference between the highest and lowest scores.
- Example: $(20 - 14 = 6)$
- Reported with other measures of variability.
- Simplest but most unstable measure of variability.

Types of Data Analysis (Continued)

Percentile: Percentage of cases a given score exceeds.
- Median is the 50th percentile.
- A score in the 90th percentile is exceeded by only 10% of scores.

Types of Data Analysis (Continued)

Standard Deviation: Average variability in a set of scores or the scores’ average deviation from the mean.
- Looks at how the data is spread across the data set.
- https://www.youtube.com/watch?v=MRqtXL2WX2M (3.5 min)
- In a normal data set, 68% of the data is one (1) deviation from the mean.

Standard Deviation

A standard deviation (σ) measures data dispersion relative to the mean, calculated via a statistical formula.
- It represents the average distance from the mean.
- Low standard deviation: data clustered around the mean.
- High standard deviation: data more spread out.

Levels of Measurement and Analysis

Nominal Data
- Lowest level; data in only one category (e.g., marital status).
- Analysis:
  - Mode
  - Frequency distribution
Ordinal Data
- Places values into categories with an order (e.g., educational level).
- Analysis:
  - Mode and median
  - Rank order of coefficients
  - Range
  - Percentile

Levels of Measurement and Analysis (Continued)

Interval Data
- Categories with equal distances (e.g., points on a scale).
- Analysis:
  - Mean, median, and mode
  - Range
  - Percentile
  - Standard deviation
Ratio Data
- Highest level; allows for a true zero (e.g., weight, height, volume).
- Analysis:
  - Mean, median, and mode
  - Range
  - Percentile
  - Standard deviation

Decision Tree for Statistical Analysis

Is the study quantitative?
- If yes, proceed with quantitative analysis.
- If no (qualitative), see chapters 7 and 8.
Flowchart for selecting appropriate descriptive statistics based on the level of measurement (Nominal, Ordinal, Interval, Ratio).
- Nominal: Frequency distribution, Mode
- Ordinal: Range, Percentile, Mode, Median
- Interval: Mean, Mode, Median, Range, Percentile, Standard deviation
- Ratio: Mean, Mode, Median, Range, Percentile, Standard deviation

Inferential Statistics

Combines mathematical processes with logic to test hypotheses about populations using data from probability samples.
- Purpose:
  - Estimate the probability that sample statistics accurately reflect the population parameter.
  - Test a hypothesis about a population.

Inferential Statistics: Parameters and Statistics

Parameter: A characteristic of a population.
- A well-defined set with specific properties.
Statistic: A characteristic of a sample used to estimate population parameters.
- Example: Survey of 100 heart failure patients shows an average knowledge score of 72%.
  - This represents the sample’s average knowledge level.
  - Researchers use this to identify knowledge deficits and improve teaching plans.

Inferential Statistics: Parametric vs. Non-Parametric Tests

Parametric Tests: Statistical procedures used when three assumptions are present.
- Sample from the population has a normal distribution.
- Level of measurement is interval or ratio with a normal distribution.
- Sample obtained through random sampling.
Non-Parametric Tests: Statistical procedures used when:
- Sample from the population does not have a normal distribution.
- Level of measurement is nominal or ordinal.
- Sample obtained through non-random sampling.

Inferential Statistics: Hypothesis Testing

Hypothesis (H1): Formal statement of the expected relationship between variables in a specified population.
Null Hypothesis (H0): States no relationship between variables; used for testing and interpreting statistical outcomes.
- Example: No significant differences in IV catheter patency between flushes with 2ml normal saline vs. 2ml heparinized saline.

Hypothesis Testing: Scientific vs. Null Hypotheses

For a quantitative study, the researcher(s) will develop two hypotheses
Scientific Hypothesis (H1): IV catheters flushed with 2ml of heparinized saline will have increased patency than those flushed with 2ml of normal saline.
- Directional hypothesis
Null Hypothesis (H0): There will be no significant differences in the duration of IV patency between those flushed with 2ml normal saline and those flushed with 2ml of heparinized saline
- Indicates no differences will occur between the two variables or groups being studied.

Hypothesis Testing: Statistical Procedures

The null hypothesis is tested using statistical procedures.
If no difference occurs between the control and intervention groups (or variables), then the null hypothesis is correct, then the findings are based on chance.
If there is a difference between the groups then the null hypothesis is rejected.
A second analysis determines if the difference is significant enough to declare the scientific hypothesis correct.

Hypothesis Testing: Rejecting the Null Hypothesis

If the null hypothesis (H0) is rejected, a relationship exists between the variables.
- Example: IV catheters flushed with 2ml of heparinized saline had increased patency compared to those flushed with 2ml of normal saline.
- Statistical procedure determines if a relationship exists.
This testing is subject to two types of errors:
- Type I
- Type II

Type I and Type II Errors

Type I Error: Rejection of the null hypothesis when it is true.
- More serious; the researcher states relationships exist when they do not.
- Consumers consider instrument reliability and validity.
Type II Error: Accepting the null hypothesis when it is false.
- Can occur if the sample is too small.

Significance Level (Alpha Level)

Before statistical analysis, the level of significance or alpha level is determined.
- The probability of making a Type I error.
- Minimum for nursing is 0.05.
- Meaning if the study were done 100 times, then the decision to reject the null hypothesis would be wrong 5/100 times.

Adjusting the Alpha Level

Researchers can set probability at 0.01 for a smaller risk of incorrectly rejecting a true null hypothesis (the decision to reject the null hypothesis would be wrong 1 time out of 100 trials).
Researchers will select an alpha level depending on how important it is not to make an error.

Practical vs. Statistical Significance

Practical and statistical significance are not the same.
A statistically significant hypothesis = unlikely that the findings have occurred by chance.
- If the level of significance was set at 0.05 – then there is a 95% chance the researcher will make the correct conclusion based on statistical tests performed on the data
Magnitude of significance is vital to the outcome of data analysis.
Practical significance – examines the practical value that the study contributes.
- If heparinized saline maintains IV catheter patency longer than normal saline = value to practice – maintain IV access longer – fewer IV sticks, increased IV treatments

Types of Inferential Statistical Tests

Researchers use different parametric and non-parametric tests to determine:
- Differences between means (average):
  - Examples: t-test and ANOVA
- Presence of a relationship:
  - Examples: Pearson r, Wilcoxon matched pairs test, the signed rank test and multiple regression

Testing for Differences: Algorithm

Is the research question asking for a difference?
- If yes, proceed to determine the number of groups.
- If no, the research question is asking for a relationship (refer to the other algorithm).
One group or more than one group?
- Two groups:
  - Interval measure? t test
  - Nominal or ordinal measure? Chi-square
- One group:
  - Interval measure? Correlated t test, ANOVA
  - Nominal or ordinal measure? Sign test, Kolmogorov-Smirnov, Signed rank, Mann-Whitney U

Testing for a Relationship: Algorithm

Is the research question asking for a relationship?
- If yes, determine the number of variables.
- If no, the research question is asking for a difference (refer to the other algorithm).
Two variables or more than two variables?
- Two variables:
  - Interval measure? Pearson product moment correlation, Point-biserial
  - Nominal or ordinal measure? Phi coefficient, Kendall's tau, Spearman's rho, Contingency coefficient
- More than two variables:
  - Interval measure? Multiple regression, Path analysis, Canonical correlation, Discriminant function analysis
  - Nominal or ordinal measure? Logistic regression

Conclusion: Evaluating Data Analysis

When examining the data analysis, ask yourself
- Is the data analysis (testing) appropriate for the:
  - Research question or hypothesis?
  - Design of the study?
  - Methods used in the study?
  - Type of data collected?
- Clues to the appropriate test must come from the research question or hypothesis.
- Look at the findings to determine if they are appropriate and applicable to the patient population and practice setting.

Review for Descriptive Statistics

Were appropriate descriptive statistics used?
What level of measurement is used for each major variable?
Is the sample size large enough to prevent one extreme score from affecting the summary statistics used?
What descriptive statistics are reported?
Were these descriptive statistics appropriate to the level of measurement for each variable?
Are appropriate summary statistics provided for each major variable?

Review for Inferential Statistics

Does the level of measurement enable the use of parametric statistics?
Is the sample size large enough to use parametric statistics?
Are the results for each of the hypotheses presented clearly and appropriately?
Are the results clear?
Is a distinction made between practical significance and statistical significance?

Summary

Review key points and critical thinking questions at the end of the chapter.
Questions/concerns
Email: binchj@algonquincollege.com