Quantitative Methods in Cultural Industries: Latent Variables and Factor Analysis

Latent Variables Analysis

  • Often, direct measurement of concepts like intelligence, satisfaction, preferences, political attitude, socio-economic status, happiness, or quality of life is not possible.

  • Indirect Measurement: Concepts and characteristics that provide information on the concepts of interest are measured instead.

  • Multidimensional Analysis: Concepts may have several dimensions that cannot be directly measured; each dimension is a latent variable.

  • Several manifest (indicator) variables are measured to gather information on the latent variables.

  • Intelligence Example:

    • Manifest variables include scores in a battery of tests (verbal, mathematical, spatial).

  • Socio-Economic Status Example:

    • Manifest variables include income measurement at different levels, occupation measurement, and level of education measurement.

Approaches to Latent Variable Analysis

  • Proxy Variable: Using a single observed variable as a substitute for the latent construct.

  • Index or Composite Measure: Combining multiple observed variables into a single score.

  • Latent Variable Model as Factor Analysis: A statistical method to uncover underlying latent variables from a set of observed variables.

  • Measuring Participation in the Arts:

    • Participation in the art is a latent construct.

    • SPPA (Survey of Public Participation in the Arts) uses nine numerical variables.

    • Multiple linear regression analysis is applied as a proxy approach.

    • Examples of variables:

      • PTC1Q1B: Number of live jazz performances in the last 12 months.

      • PTC1Q2B: Number of live Latin, Spanish, or salsa music performances in the last 12 months.

      • PTC1Q3B: Number of live classical music performances in the last 12 months.

      • PTC1Q4B: Number of live opera performances in the last 12 months.

      • PTC1Q5B: Number of live musical stage plays in the last 12 months.

      • PTC1Q6B: Number of live non-musical stage plays in the last 12 months.

      • PTC1Q7B: Number of live ballet performances in the last 12 months.

      • PTC1Q8B: Number of live dance (non-ballet) performances in the last 12 months.

      • PEC1Q10A: Number of visits to art museums or galleries in the last 12 months.

    • A composite measure of participation can be the sum of the nine numerical variables.

Factor Analysis

  • Interdependency Technique: Multivariate technique involving the joint analysis of several manifest variables to:

    • Assess associations among them.

    • Identify patterns of association.

  • How Factor Analysis Works:

    • Identification of groups of variables.

      • High association within the group.

      • Low association between groups.

    • Each group of variables represents a pattern of association, corresponding to a latent dimension.

    • For each group, a new variable is generated to measure the latent dimension.

  • Factor analysis is typically run on numerical or ordinal variables (treated as numerical).

  • Association is measured as correlation.

  • Aims of Factor Analysis:

    • Data summarization: Stop at the identification of patterns.

    • Data reduction: Use fewer new variables in further analysis.

  • Manifest Variables:

    • PTC1Q1B Number of live jazz performances last 12 months

    • PTC1Q2B Number of live Latin, Spanish, or salsa music performances last 12 months

    • PTC1Q3B Number of live classical music performances last 12 months

    • PTC1Q4B Number of live opera performances last 12 months

    • PTC1Q5B Number of live musical stage plays last 12 months

    • PTC1Q6B Number of live non musical stage plays last 12 months

    • PTC1Q7B Number of live ballet performances last 12 months

    • PTC1Q8B Number of live dance (non-ballet) performances last 12 months

    • PEC1Q10A Number of visits to art museum or gallery last 12 months

  • Aims of measuring participation in art using Factor Analysis:

    • Identification of patterns of association and respondents profiles.

    • Data reduction by generating a smaller number of variables.

Stages in Factor Analysis

I. Data screening
II. Extraction of factors
III. Interpretation and rotation
IV. Generation of new variables

I. Data Screening

  • Work on standardized variables, where each variable has a mean equal to 0 and a variance equal to 1.

  • Aims of data screening are to:

    • Assess association among the variables.

    • Measure the strength of the association using correlation.

    • Determine if the variables are highly inter-correlated (partial correlations)

  • Measuring Pairwise Correlation

    • Pearson's correlations.

    • Partial correlations.

  • Measuring Inter-correlation

    • MSA (Measure of Sampling Adequacy): For the i-th variable, it's calculated as: MSAi = \frac{\sum{j \neq i} r{ij}^2}{\sum{j \neq i} r{ij}^2 + \sum{j \neq i} r_{ij \cdot partial}^2}

      • Where r{ij} is the Pearson correlation between variable i and variable j, and r{ij \cdot partial} is the partial correlation.

    • MSA for all variables: Kaiser-Meyer-Olkin (KMO) MSA. This is the same computation, but for all variables. → is the data fit “adequately” for factor analysis

  • Bartlett’s Test:

    • Null hypothesis: all Pearson’s correlations equal to 0.

    • Alternative hypothesis: at least one Pearson’s correlation is different from zero.

    • Test statistic: measure of distance between observed correlation matrix and identity matrix (e.g., 9147.205).

    • Significance (p-value).

II. Factors Extraction

Analytical Stage:
  • Selection of extraction method

  • Selection of the number of factors

  • Production of initial solution

Approaches: the extraction methods
  • PCA – Principal Components Analysis

  • PAF – Principal Axis Factoring

  • Similar results

PCA (Principal Component Analysis):
  • Descriptive technique.

  • Used for data reduction to replace p correlated numerical variables by a smaller number of uncorrelated variables. correlated → uncorrelated smaller number

Total variance:
  • Indicator variables are standardized.

  • Total variance is a proxy of the total amount of information.

  • Total variance is the sum of the variables' variances and equals the total number of variables.

How PCA work:
  • Given p manifest variables, PCA generates p not correlated variables.

  • The total variance of the components = total variance of manifest variables = p.

  • Each component is a weighted average (linear combination) of the original variables.

  • Components are ranked with respect to their variance.

  • The first component has the largest variance and explains the largest proportion of the total variance.

  • The second component has the second largest variance.

Data reduction:
  • p components contain the same amount of information as the original variables.

  • The first components, which account for the largest amount of information, are kept.

  • The remaining components are discarded without significant loss of information.

Data summarization:
  • The first components allow understanding the underlying structure in the data and identifying patterns in the data.

Intuition of Computation
  • Linear algebra computation on correlation matrix.

  • Eigenvalue: variance of the component.

  • Eigenvector: vector of weights of the linear combinations.

Selection of the Number of Components
  • Select few components to retain as much as possible of the total amount of information.

  • Retain as much as it is possible of the total amount of information

Methods
  • Kaiser’s criterion: components with variance greater than 1.

  • Fix the percentage of information to retain.

  • Inspection of the scree plot.

Component as Linear Combination
  • First component = \beta1 X1 + \beta2 X2 + … + \betap Xp

Component Loading.
  • For each component

  • Component loadings

  • Component loading

  • Rescaled weight of the linear combination

  • Rescaled larger coefficient for the most important components.

Component loading
*   Weight multiplied by the component standard deviation
*   Each loading is a correlation between the indicator and the component
Component Interpretation
*   Interpretation based on loadings.
*   Each loading = indicator’s contribution to component
Readable Solution
*   For each component, few indicators with high loadings (absolute value) and remaining indicators with very small loading (absolute value)

III. Components Rotation

  • Stages:
    I. Data screening
    II. Extraction of factors
    III. Interpretation and rotation of the factors
    IV. Generation of new variables

  • Second analytical stage is for clarification of the underlying structure

  • Readable component matrix Geometric transformation of the components

  • Types of Rotation:

    • Orthogonal Rotation: components remain uncorrelated

    • Oblique Rotation: components are correlated

  • Varimax Rotation: = Orthogonal

    • Orthogonal rotation

    • High loadings for a smaller number of components

    • Low loadings for the rest

IV. Generation of New Variables

  • New variables become weighted averages of the indicators Weights such that the highest weights indicators with the highest loadings the new variables are standardized

PCA: Component Interpretation and scores

  • Factor’s extraction

  • Principal Components Method

  • Selection and interpretation of the components

  • Component Loading-> Each loading correlation between the indicator and the component

Principal Axis Factoring

*   Intution of the computation
*   Factor model with 4 factors  32% total variability explainedFactor INTERPRETATION