Data Screening

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
GameKnowt Play
New
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/25

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

26 Terms

1
New cards

What is the main purpose of data screening?

To ensure the data is accurate, complete, and suitable for statistical testing.

2
New cards

What are the main things you check during data screening?

Accuracy, missing data, outliers, normality, linearity, homoscedasticity, and multicollinearity.

3
New cards

What is the first step in data screening?

Check the data for accuracy (entry errors or impossible values)

4
New cards

What is the difference between grouped and ungrouped data?

Grouped data has categories (e.g., male/female), while ungrouped data combines everyone for overall analysis (e.g., regression).

5
New cards
6
New cards

What are the three main types of missing data?

MCAR (Missing Completely at Random), MAR (Missing At Random), MNAR (Missing Not At Random).

7
New cards

What does MCAR mean?

Data is missing for completely random reasons, unrelated to any variable.

8
New cards

What does MAR mean?

Missing at random (Data is missing because of another variable, not itself) 

9
New cards

What does MNAR mean?

Data is missing because of the variable itself (e.g., low-income people skip income question).

10
New cards

What should you do if data is MNAR?

Investigate the reason before imputing, as imputing may add bias.

11
New cards

When is Expectation Maximisation (EM) appropriate?

When missing data is small (less than 5%) and random (MCAR or MAR).

12
New cards

When is regression imputation or Multiple Imputation (MI) appropriate?

When missing data is around 5–10% and random or at random (MCAR or MAR).

13
New cards

What should you do if more than 40% of your data is missing?

Stop and recollect the data as its unreliable 

14
New cards

What is the difference between univariate and multivariate outliers?

Univariate = one outlier across one variable Multivariate = one outlier across multiple variables

15
New cards

How do you detect a multivariate outlier in SPSS?

Use Mahalanobis Distance in regression.

16
New cards

What is normality in data screening?

Data that follows a roughly bell-shaped curve (normal distribution).

17
New cards

What is homoscedasticity

Equal spread of residuals across all levels of the independent variable; looks like a cloud with no funnel shape.

18
New cards

What is multicollinearity?

When two or more predictors are highly similar, have a correlation above 0.7

19
New cards

What can you do if two predictors show high multicollinearity?

Remove one variable, or combine them into a new one (e.g., anxiety + stress = mental health).

20
New cards

Normality, linearity, homoscedasticity, independence, no multicollinearity, and random sampling.

21
New cards

What does homoscedasticity look like in a scatterplot?

Evenly spread dots (a cloud), not a funnel shape.

22
New cards

What’s an example of a univariate outlier?

Someone aged 80 in a group where everyone else is between 20–30.

23
New cards

How can you check for normality in SPSS?

Use histograms, Q–Q plots, or skewness and kurtosis values.

24
New cards

What’s the safe percentage of missing data before results become biased?

under 10%

25
New cards

What are some ways to fix non-normal data?

Apply a log or square root transformation, or remove extreme outliers.

26
New cards

Why is data screening important?

To avoid misleading conclusions and ensure valid, reliable results.