Workshop Week 4
Research study: Factors that may influence the use of environmentally friendly products
A study examining socio-cultural factors that are expected to influence attitudes towards using environmentally friendly products, invited adolescents (aged 13 – 16) to complete several questionnaires. The questionnaires measured the following dependent variables:
perceived seriousness of environmental problems
perceived value for money
o perceived social pressure to buy environmentally friendly products
Higher scores on these questionnaires represent greater perceived seriousness of environmental problems, greater perceived value for money, and greater perceived social pressure to buy environmentally friendly products. The possible scores can range from: 10-100 for seriousness of environmental problems; from 1 – 5 for the measure of value for money, and from 1 – 7 for social pressure.
Fictional data reflecting the design of this study are presented in the sociocultural factors file, which you will find in this week’s folder.
Demographic data
Open the sociocultural factors data file and generate some demographic data to show the gender distribution in this study. See below for a reminder of how to find out frequencies:
Click Analyses from the top
Click Exploration
Then click Descriptives
Move the variable you are interested into the Variables box
Select the ‘Frequency tables’ box
1. Write the gender distribution below:
55 men
70 women
3 nonbinary people
Using the steps below, generate the following for your sample:
mean age (and the standard deviation)
mean 14.7
standard deviation 1.05
the minimum and maximum age:
minimum 13
maximum 16
Check you are still in the Exploration menu
Move the variable you are interested in into the Variables box
Checking your dependent variables for erroneous scores
3. What are the three dependent variables for the socioculturalfactors dataset?
social pressure
seriousness of problem
value for money
4. What are the possible ranges of scores for each of these variables (this information can be found at the beginning of this worksheet)
1-7 for social pressure
1-5 for value for money
10-100 for seriousness of problem
Use the steps you have just used to generate the minimum and maximum values for the dependent variables, combined with information in the study summary at the start, to examine whether your dependent variables have scores within the expected ranges.
Generating histograms (one way to check whether data is normally distributed)
Follow the steps below to generate a histogram for each dependent variable. This is one way to examine whether the data in this study is normally distributed.
Go to: Analyses – Exploration – Descriptives (you are probably already here after the previous steps)
Remove Gender and Age from the Variables box (these will still be here from your earlier analyses)
Transfer the 3 dependent variables into the Dependent List box (if not already there from earlier)
Click on Statistics.
Under Distribution, select Skewness and Kurtosis.
Under Normality, select Shapiro-Wilk
Click on the arrow next to Plots to open up the menu
Under Histograms, select Histogram
Under Box Plots, select Box plot
Examine the histograms for each of your dependent variables.
6. Does the data for each variable appear to be normally distributed? Explain your answer.
social pressure - no
seriousness of problem - no, positive skew
value for money - no chance outlier
7. Why might it be a problem to rely only on histograms when deciding whether your data is normally distributed?
no numerical value to confirm etc
Calculating skewness and kurtosis (another way to check whether data is normally distributed)
The values of skewness and kurtosis for each variable are added to the Descriptives table (the same table generated earlier, and so at the top of your Results pane). These provide a second way to decide whether the data is normally distributed. Positive values of skewness show that scores cluster at the left end of the distribution, with a tail to the right, while negative values of skewness indicate that scores cluster at the right-side end of the distribution, with a long tail to the left. As well as judging these values in terms of being positive or negative we need to convert these values into z-scores to show whether there is a significant amount of skewness in the data.
8. For each of the 3 variables, convert the values of skewness and kurtosis (from the Descriptives table) into z-scores using the formula below and enter the values into the table provided (Note. SE refers to the Standard Error which is also included in the table labelled as “std. error”).
ZSkewness = Skewness for the variable ÷ by SESkewness
ZKurtosis = Kurtosis for the variable ÷ by SEKurtosis
Dependent Variable | Skewness | SE | Z skewness |
| Kurtosis | SE | ZKurtosis |
Seriousness of problem | 1.26 | 0.214 | 5.89 |
| 5.46 | 0.425 | 12.8 |
Social Pressure | -0.0736 | 0.214 | -0.344 |
| -0.141 | 0.425 | -0.332 |
Value for money
| 0.0527 | 0.214 | 0.246 |
| -0.340 | 0.425 | -0.8 |
9. If any of the Z score values, resulting from the above calculations, are greater than 1.96 or -1.96 then it’s likely that the data has a significant (p < .05) amount of skew or kurtosis. Is this the case for any of the variables?
seriousness of problem
Statistical tests to check the normal distribution (a third and final way)
A third way to examine normality is a statistical test called the Shapiro-Wilk test, that compares a distribution of your data to a normal distribution to see if it is different.
You have a lot of rows of data in your Descriptives table now. Look at the bottom two rows of this table for your Shapiro-Wilk test of normality (Shapiro-Wilk W) and the significance value (Shapiro-Wilk p) to see the results of this test for each of your dependent variables.
If the tests are not significant (i.e. if the significance, or p value, is > .05), then the distribution is not significantly different from a normal distribution, and therefore is a normal distribution.
If the test is significant (i.e. if the significance, or p value, is < .05), then the distribution is significantly different from a normal distribution, and therefore suggests a non-normal distribution.
10. Are any of the variables not normally distributed according to these tests?
seriousness of problem
Ok, so what do we do now? We need to investigate why this data isn’t normal, what's causing the problem?
We can look at the box plot to identify any outliers. Find the box plot for the variable that isn’t normal. Outliers, or extreme scores, can be identified on this type of graph as they are shown with a dot and tell you the case number, or participant, that has the outlying score.
11. Which case number has the most extreme outlying score?
128
We can find this case in the Data spreadsheet and find out their score for this variable.
12. What was the score?
100
If this score is more than 3 standard deviations bigger than the mean score for this variable, then it definitely is an extreme score, and we can justify removing this participant/case from the data set.
So let’s check:
13. In the Descriptives table on the Results pane you should be able to find the mean and standard deviation for this variable. Write these below:
mean = 39.4
standard deviation =11.7
14. Now calculate 3 x the standard deviation and add that to the mean, you can do this below:
3 standard deviations = 35.1
39.4+35.1=74.5
15. Finally, is the outlying score greater than the mean plus 3 x standard deviations?
yes
If so, we can justify removing this case as it is an outlier. We do this by clicking on the case number to highlight the line, right click, and cut. We would always report that we had done this in the Participants section of a report, e.g. the data for one participant was removed due to an extreme score in the Perceived Seriousness of Environmental Problems variable.
Having removed a case, you should now re-check the normality of this variable (the Descriptives table will update automatically after the participant is removed). Although removing the outlier has improved the normality of the data, the Test of Normality for this variable is still significant. This means that any further analyses conducted using this variable will need to apply a non-parametric test (we will cover this further next week).
In research reports these tests of normality are usually reported in the following way:
e.g. The self-esteem scores were found to be normally distributed (W = .96, p > .05)
In the above sentence W is the value of the Shapiro-Wilk test statistic and p is the significance.
16. Report the normality for each of the variables in the standard format below:
The social pressure scores were found to be normally distributed (W = .96, p > .05)
The seriousness of problem scores were not found to be normally distributed (W = .978 , p > .05)
The value for money scores were found to be normally distributed (W = .992, p > .05)
17. Quick recap! What are the 3 methods we can use to examine the normality of data and which one do you think is the most reliable?
eyeballing histograms
z score
shapiro wilk