1/25
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the main purpose of data screening?
To ensure the data is accurate, complete, and suitable for statistical testing.
What are the main things you check during data screening?
Accuracy, missing data, outliers, normality, linearity, homoscedasticity, and multicollinearity.
What is the first step in data screening?
Check the data for accuracy (entry errors or impossible values)
What is the difference between grouped and ungrouped data?
Grouped data has categories (e.g., male/female), while ungrouped data combines everyone for overall analysis (e.g., regression).
What are the three main types of missing data?
MCAR (Missing Completely at Random), MAR (Missing At Random), MNAR (Missing Not At Random).
What does MCAR mean?
Data is missing for completely random reasons, unrelated to any variable.
What does MAR mean?
Missing at random (Data is missing because of another variable, not itself)
What does MNAR mean?
Data is missing because of the variable itself (e.g., low-income people skip income question).
What should you do if data is MNAR?
Investigate the reason before imputing, as imputing may add bias.
When is Expectation Maximisation (EM) appropriate?
When missing data is small (less than 5%) and random (MCAR or MAR).
When is regression imputation or Multiple Imputation (MI) appropriate?
When missing data is around 5–10% and random or at random (MCAR or MAR).
What should you do if more than 40% of your data is missing?
Stop and recollect the data as its unreliable
What is the difference between univariate and multivariate outliers?
Univariate = one outlier across one variable Multivariate = one outlier across multiple variables
How do you detect a multivariate outlier in SPSS?
Use Mahalanobis Distance in regression.
What is normality in data screening?
Data that follows a roughly bell-shaped curve (normal distribution).
What is homoscedasticity
Equal spread of residuals across all levels of the independent variable; looks like a cloud with no funnel shape.
What is multicollinearity?
When two or more predictors are highly similar, have a correlation above 0.7
What can you do if two predictors show high multicollinearity?
Remove one variable, or combine them into a new one (e.g., anxiety + stress = mental health).
Normality, linearity, homoscedasticity, independence, no multicollinearity, and random sampling.
What does homoscedasticity look like in a scatterplot?
Evenly spread dots (a cloud), not a funnel shape.
What’s an example of a univariate outlier?
Someone aged 80 in a group where everyone else is between 20–30.
How can you check for normality in SPSS?
Use histograms, Q–Q plots, or skewness and kurtosis values.
What’s the safe percentage of missing data before results become biased?
under 10%
What are some ways to fix non-normal data?
Apply a log or square root transformation, or remove extreme outliers.
Why is data screening important?
To avoid misleading conclusions and ensure valid, reliable results.