Often, direct measurement of concepts like intelligence, satisfaction, preferences, political attitude, socio-economic status, happiness, or quality of life is not possible.
Indirect Measurement: Concepts and characteristics that provide information on the concepts of interest are measured instead.
Multidimensional Analysis: Concepts may have several dimensions that cannot be directly measured; each dimension is a latent variable.
Several manifest (indicator) variables are measured to gather information on the latent variables.
Intelligence Example:
Manifest variables include scores in a battery of tests (verbal, mathematical, spatial).
Socio-Economic Status Example:
Manifest variables include income measurement at different levels, occupation measurement, and level of education measurement.
Proxy Variable: Using a single observed variable as a substitute for the latent construct.
Index or Composite Measure: Combining multiple observed variables into a single score.
Latent Variable Model as Factor Analysis: A statistical method to uncover underlying latent variables from a set of observed variables.
Measuring Participation in the Arts:
Participation in the art is a latent construct.
SPPA (Survey of Public Participation in the Arts) uses nine numerical variables.
Multiple linear regression analysis is applied as a proxy approach.
Examples of variables:
PTC1Q1B: Number of live jazz performances in the last 12 months.
PTC1Q2B: Number of live Latin, Spanish, or salsa music performances in the last 12 months.
PTC1Q3B: Number of live classical music performances in the last 12 months.
PTC1Q4B: Number of live opera performances in the last 12 months.
PTC1Q5B: Number of live musical stage plays in the last 12 months.
PTC1Q6B: Number of live non-musical stage plays in the last 12 months.
PTC1Q7B: Number of live ballet performances in the last 12 months.
PTC1Q8B: Number of live dance (non-ballet) performances in the last 12 months.
PEC1Q10A: Number of visits to art museums or galleries in the last 12 months.
A composite measure of participation can be the sum of the nine numerical variables.
Interdependency Technique: Multivariate technique involving the joint analysis of several manifest variables to:
Assess associations among them.
Identify patterns of association.
How Factor Analysis Works:
Identification of groups of variables.
High association within the group.
Low association between groups.
Each group of variables represents a pattern of association, corresponding to a latent dimension.
For each group, a new variable is generated to measure the latent dimension.
Factor analysis is typically run on numerical or ordinal variables (treated as numerical).
Association is measured as correlation.
Aims of Factor Analysis:
Data summarization: Stop at the identification of patterns.
Data reduction: Use fewer new variables in further analysis.
Manifest Variables:
PTC1Q1B Number of live jazz performances last 12 months
PTC1Q2B Number of live Latin, Spanish, or salsa music performances last 12 months
PTC1Q3B Number of live classical music performances last 12 months
PTC1Q4B Number of live opera performances last 12 months
PTC1Q5B Number of live musical stage plays last 12 months
PTC1Q6B Number of live non musical stage plays last 12 months
PTC1Q7B Number of live ballet performances last 12 months
PTC1Q8B Number of live dance (non-ballet) performances last 12 months
PEC1Q10A Number of visits to art museum or gallery last 12 months
Aims of measuring participation in art using Factor Analysis:
Identification of patterns of association and respondents profiles.
Data reduction by generating a smaller number of variables.
I. Data screening
II. Extraction of factors
III. Interpretation and rotation
IV. Generation of new variables
Work on standardized variables, where each variable has a mean equal to 0 and a variance equal to 1.
Aims of data screening are to:
Assess association among the variables.
Measure the strength of the association using correlation.
Determine if the variables are highly inter-correlated (partial correlations)
Measuring Pairwise Correlation
Pearson's correlations.
Partial correlations.
Measuring Inter-correlation
MSA (Measure of Sampling Adequacy): For the i-th variable, it's calculated as: MSAi = \frac{\sum{j \neq i} r{ij}^2}{\sum{j \neq i} r{ij}^2 + \sum{j \neq i} r_{ij \cdot partial}^2}
Where r{ij} is the Pearson correlation between variable i and variable j, and r{ij \cdot partial} is the partial correlation.
MSA for all variables: Kaiser-Meyer-Olkin (KMO) MSA. This is the same computation, but for all variables. → is the data fit “adequately” for factor analysis
Bartlett’s Test:
Null hypothesis: all Pearson’s correlations equal to 0.
Alternative hypothesis: at least one Pearson’s correlation is different from zero.
Test statistic: measure of distance between observed correlation matrix and identity matrix (e.g., 9147.205).
Significance (p-value).
Selection of extraction method
Selection of the number of factors
Production of initial solution
PCA – Principal Components Analysis
PAF – Principal Axis Factoring
Similar results
Descriptive technique.
Used for data reduction to replace p correlated numerical variables by a smaller number of uncorrelated variables. correlated → uncorrelated smaller number
Indicator variables are standardized.
Total variance is a proxy of the total amount of information.
Total variance is the sum of the variables' variances and equals the total number of variables.
Given p manifest variables, PCA generates p not correlated variables.
The total variance of the components = total variance of manifest variables = p.
Each component is a weighted average (linear combination) of the original variables.
Components are ranked with respect to their variance.
The first component has the largest variance and explains the largest proportion of the total variance.
The second component has the second largest variance.
p components contain the same amount of information as the original variables.
The first components, which account for the largest amount of information, are kept.
The remaining components are discarded without significant loss of information.
The first components allow understanding the underlying structure in the data and identifying patterns in the data.
Linear algebra computation on correlation matrix.
Eigenvalue: variance of the component.
Eigenvector: vector of weights of the linear combinations.
Select few components to retain as much as possible of the total amount of information.
Retain as much as it is possible of the total amount of information
Kaiser’s criterion: components with variance greater than 1.
Fix the percentage of information to retain.
Inspection of the scree plot.
First component = \beta1 X1 + \beta2 X2 + … + \betap Xp
For each component
Component loadings
Component loading
Rescaled weight of the linear combination
Rescaled larger coefficient for the most important components.
* Weight multiplied by the component standard deviation
* Each loading is a correlation between the indicator and the component
* Interpretation based on loadings.
* Each loading = indicator’s contribution to component
* For each component, few indicators with high loadings (absolute value) and remaining indicators with very small loading (absolute value)
Stages:
I. Data screening
II. Extraction of factors
III. Interpretation and rotation of the factors
IV. Generation of new variables
Second analytical stage is for clarification of the underlying structure
Readable component matrix Geometric transformation of the components
Types of Rotation:
Orthogonal Rotation: components remain uncorrelated
Oblique Rotation: components are correlated
Varimax Rotation: = Orthogonal
Orthogonal rotation
High loadings for a smaller number of components
Low loadings for the rest
New variables become weighted averages of the indicators Weights such that the highest weights indicators with the highest loadings the new variables are standardized
Factor’s extraction
Principal Components Method
Selection and interpretation of the components
Component Loading-> Each loading correlation between the indicator and the component
* Intution of the computation
* Factor model with 4 factors 32% total variability explainedFactor INTERPRETATION