Topic 9

Principal Components and Factor Analysis

What is depression and how do we measure it? Could we simply ask someone: are you depressed (yes/no)?

The answer is no – depression is a very multifaceted concept and not everyone has the same ideas about it.

Some of these are more related to one another than others
Some are more important to the explanation of depression that others

Principle components analysis (PCA) and factor analysis (FA) are exploratory analyses that aim to map different items onto a few underlying factors that make up a construct – like depression – to best explain what this construct consists of.

PCA and FA both group items through their connection to a new compositive variable, called a component/factor

e., both are exploratory data techniques

PCA and FA both mathematically form linear composites that represent the underlying structure of the correlation matrix in terms of groups of items of correlated items whose variance is (ideally) explained by the given composite

The composite is what becomes the construct/factor
Cannot test the significance of the factors – PCA/FA are exploratory and descriptive analyses only

PCA:

Used when producing a new scale as it can maximise the variance extracted in the outcome; because all variance is included (unique, shared, and error)
Based on correlation / covariance

FA:

Can also be used when producing new scales, but is also often used to test already existing theories or scales
Also based on correlation/ covariance but excluding any unique variance.

PCA is based on the pattern of correlations among variables (the correlations matrix). For example:

V1-3 = latent factor achievement V4-6 = latent factor competency Can see achievement items are highly correlated with one another and weakly correlated with competency items, and vice versa.

This matrix alone cannot tell us what the underlying constructs are. We need conceptually informed hypotheses to direct our interpretation of this table.

Every item needs to have an inter-item correlation of at least .30 with at least 1 other item, else it should be removed

The provided items need to load onto specific latent factors (e.g., achievement and competence)

These are not directly measured: their presence is indicated by item scores

All items have a relationship with each factor, however, these correlations will likely be much weaker for the factor/s the item doesn’t load onto. For example:

Factors are formed based on the shared variance in the items included in the scale.

In PCA, all of the variance in each item is included in the analysis (unique, shared, error)

This is because PCA aims to maximise the % of variance accounted for by each of the factors extracted

FA uses only the shared and error variance

Communality: the percentage of variance in the individual items that is accounted for in the solution

PCA: will be 1 prior to analysis because 100% of the variance is used
FA: will be less than 1 prior to analysis because unique variance is excluded

Dimension Reduction Techniques

Data reduction techniques try to make sense of the world through grouping like characteristics and separating characteristics that are different into different groupings – called summarising

They ultimately aim to find underlying dimensions or relationships among variables that these characteristics can be grouped by

Used in the development of scales
Can summarise related groups of variables for use in other analyses

Research Questions

Research questions relating to data reduction techniques may include:

Is there an underlying structure linking subsets of items from a scale?
What is the nature of the latent variables underlying this construct?
- Latent variables: variables not being directly measured (like achievement or competency in this example – the factors/constructs)

Conceptual Basis – the Extraction Process

The extraction process is the process of teasing apart the underlying factors that a scale taps into. This is based on eigenvectors, which are formed to extract to shared variance in the solution – this uses a stepwise process.

The initial factor extracted will extract all shared variance between items that it can – maximises variance explained
Anything leftover is the residual
This same process occurs for any other factors but their eigenvalues will be smaller

Eigenvectors: mathematically compute the principal components from the correlations matrix

Eigenvalues: the proportion of variance extracted by each eigenvector

The total initial eigenvalue will equal the total number of items (e.g., six items = eigenvalue of six), but we will want to summarise this into a smaller number

For example:

Can see the initial eigenvalues add up to 6 for the 6 items included in the scale, and this is summarised by SPSS automatically

How to Determine the Number of Factors to Expect in PCA/FA

This is a controversial topic!

1st strategy is using statistics only

All items with an eigenvalue > 1 should be extracted as a factor
Use a scree plot (or parallel analysis – a bit like bootstrapping)

Scree Plot:

Look for the shoulder of the curve: the point in which the distance between the size of the eigenvalues becomes reduced.

Anything before this point can be extracted as a factor

Parallel Analysis:

Cannot be conducted directly in SPSS

Parallel analysis puts the number of items, the sample size, and the technique (PCA or FA). From this, 1000 samples are generated to produce eigenvalues for all components. This provides:

Average eigenvalues for each component
95% percentile eigenvalues for each component – this is preferred over the average eigenvalues

Can compare SPSS’s eigenvalue with the parallel analysis eigenvalue; if ours is larger, the factor is viable and can be extracted

Could extract components 1 and 2 but wouldn’t extract component 3 Looking back to the provided example:

Can see the 2-factor solution produced by SPSS is consistent with the scree plot and parallel analysis

This is neat – it doesn’t always work out like this

2nd strategy is using a self-determined method

Based on theory and concepts in the research area
Researcher determines number of components extracted based on this
The preferred option

The aim of any solution is to maximise the variance accounted for by each component.

Components altogether should account for 25% of variance at least
- Wont be able to account for 100% because we are trying to summarise – by the nature of this, some variance is lost
Each component should be conceptually meaningful with at least 4 factors with factor loadings at least .30
- Better solutions may use a cutoff of .40

For example: the total variance explained table explains initially how variance is accounted for, where each item is considered a component, and reduces this down (in this instance to 3 components)

Initially they added up to account for 100%
The process of extraction/ summarisation reduces this – but this example is still quite good
- Anything over 65% is considered great

This also exhibits how with any rotation, the overall variance explained doesn’t change, but the way it is partitioned between components changes.

Interpreting Factors Using Factor Loading

Factor Loadings: the correlations between each item and the component

Range between +/- 1
Interpreted as a correlation
The larger the factor loading, the more likely that item loads onto that factor

Each item has a factor loading for each component. We want each item to have a high loading on one factor at a time only.

Because this produces a solution with independent solutions
Important because with PCA and FA we are trying to explain the different components of a multidimensional construct

We need at least 100 participants in a PCA or FA. where n =/> 100, factor loadings:

.30-.39 minimum (accounting for 10% variance)
.40-.49: stronger/ more robust (16% variance)
.50-.59: practically significant (> 25% variance)
.70: very strong effect, this item is very important
Its unusual to have loadings over .80

Linear Combinations

Each factor is made up of the sum of all items on the scale. These produce a linear composite that is weighted by the item factor loading to reflect how strong the relationship is between each item and the factor

Ability factor = .8math + .7reading + .7spelling + .1scholastic + .01behavioural + .2social
Competence factor = .01math + .1reading + .02spelling + .7scholastic + .6behavioural + .65social

Can see that ability items have higher loadings onto the ability factor, and competency items have higher loadings on the competency factor

This is like an ideal scenario – similar items have high associations with one factor and low associations with the other factor/s

Factor Component Matrix

Items Components

What we want are:

High factor loadings for factor 1 on the ability items
High factor loadings for factor 2 on the competency items

Can see quite a few of these items have high cross-loadings. Why does this happen?

Mathematically, the initial components matrix will try to maximise the variance accounted for by factor 1 – trying to generate a general component
- So factor 1 has a very high eigenvalue
- It will have relatively high loadings on as many items as possible

Therefore, the initial component matrix is not interpretable in the way we want, because there is no good way to separate the independent components

We can apply a factor rotation to deal with this: mathematically reorganises the way the variance is assigned to the different components by moving the factors themselves.

Component Factor Rotations

There are two types of rotations:

Independent/ orthogonal rotations: used when components are independent of one another
Correlated/ non-independent/ oblique rotation: used when components can be correlated with one another

Factors can be represented in 2D space.

Components initially represented perpendicular to one another (separated at the origin by a 90 degree angle)
Means they are independent

Here, 1 and 2 make up a good solution because factor 1 items load highly onto factor 1, and very weakly onto factor 2, and vice versa for factor 2 items.

Where items don’t load this nicely, we can apply a rotation to separate items more on components 1 and 2.

Unrotated Example:

Orthogonal / Independent Rotations

For this example, we can apply an orthogonal / varimax rotation.

Maintains the independence of components
Items stay in place, it’s the components that move

Example of Orthogonal Rotation:

Overall, the aim is to maximise the factor loadings on the relevant component and the separation between the different groups of items

Comparing unrotated and rotated component matrices:

Can see rotations tend to solve a lot of the major problems we encounter with the unrotated solution.

Can see here the percentage of variance accounted for by each component evens out in the rotated solution.

The eigenvalue decreases slightly for factor 1 (the general factor) and some of this is given to the subsequent factor/s

Here, there are several items that have cross-loadings with 2-3 items. Even with the rotation, this solution isn’t perfect. One way we could deal with this is by removing some items (e.g., the ones that have cross-loadings on 3 items)

How this Influences Variance Extracted

What we want to know for each item is how much variance in them is explained by the solution, i.e., how much they contribute to the solution.

Can see when a rotation is applied, the total variance accounted for doesn’t change, but the way it is divided between factors does.

This example tells us how well the 3-factor solution explains variance in responses for each item.

How well can we predict peoples responses when we use the 3-factor solution

Most of these are pretty good

Below .4: consider whether it is worth keeping
Below .1: remove

Naming and Describing Components/Factors

The naming of factors has a number of rules:

If replicating someone else’s solution, use the same names for factors being replicated
- If changing them: it needs to be clear what they correspond to in past studies
- Changing makes it harder to interpret the outcome
Don’t use the name of a single item: because components should consist of a combination of items
- Find something representative of that group

Interpreting the Outcome

Example:

This is a robust solution because of the overall variance accounted for

Example:

This is not a robust soliton because of the high cross-loadings.

Could remove some items
Could also try an oblique solution to see if this changes things