PUB561 module 1 part 2
Basic Terminology
Variable: Characteristic of interest that varies across individuals (e.g., gender, disease status, BMI, age).
Data: Values measured for a variable. Singular value for one participant is called a datum.
Types of Variables
Independent Variable:
Also known as explanatory variable, exposure, or predictor.
Potentially influences the dependent variable (e.g., age may influence BMI).
Dependent Variable:
Also called outcome, endpoint, or response variable.
Potentially influenced by independent variables (e.g., BMI is dependent on factors such as age).
Relationships
A relationship signifies an association between two or more variables (e.g., evidence of relationship between age and BMI).
Older individuals tend to have higher BMI.
Forms of Data
Continuous Data:
Represent numerical amounts measured with high precision.
Examples: Height (cm), Weight (kg), Time (seconds), Blood Pressure.
Categorical Data:
Organized into categories, non-numeric.
Limited, finite number of categories (e.g., gender).
Scales of Measurement in Categorical Data
Dichotomous/Binary Data:
Only two possible values (e.g., male/female, alive/dead).
Nominal Data:
More than two groups, with no inherent order (e.g., eye color, blood type).
Ordinal Data:
More than two groups with a natural ordering (e.g., satisfaction scale: strongly agree, agree).
Interval Data:
Groups with equal and meaningful intervals but rarely used in practice.
Example: Zero to nine, where each group represents a number.
Data Reduction
Collect data in its most precise form before categorizing for analysis.
Example: Collect age in years, then group into categories (20s, 30s, etc.).
Easier to collapse than to expand data later in analysis.
Key Takeaways
Variables can be independent (predictors) or dependent (responses).
Data can be continuous or categorical.
Categorical data can be further divided into binary, nominal, ordinal, and interval scales.
Aim to collect data at the highest level of precision before categorization.
Practice Exercise Examples
Example 1:
Statement: "The time from diagnosis to death was 24.6 months."
Variable: Time from diagnosis to death.
Value: 24.6 months.
Type: Continuous variable (due to decimal).
Example 2:
Statement: "The survival of the patient was noted 24 months after diagnosis."
Variable: Patient survival (alive/dead).
Type: Binary data.