A Review of the Content, Criterion-Related, and Construct-Related Validity of Assessment Center Exercises

Overview of the Study

Authors: Brian J. Hoffman, Colby L. Kennedy, Alexander C. LoPilato, Elizabeth L. Monahan
Affiliation: University of Georgia; Charles E. Lance, Organizational Research & Development, Lawrenceville, Georgia
Purpose: Evaluate content, criterion-related, construct, and incremental validity of five types of Assessment Center (AC) exercises using meta-analysis and qualitative review.
- Types of AC Exercises:
- In-basket exercises
- Leaderless group discussions (LGD)
- Role-play exercises
- Case analysis exercises
- Oral presentation exercises

Meta-Analysis Results

Relationships Analyzed:
- Between the five AC exercise types
- Between exercise types and the Five-Factor Model of Personality (FFM)
- General Mental Ability (GMA)
- Relevant criterion variables
Findings:
- All five types of exercises were significantly related to criterion variables with correlation coefficients ranging from $r = 0.16$ to $r = 0.19$ .
- Exercises were modestly associated with:
- GMA
- Extraversion
- Openness to Experience (lesser extent)
- Exercises were largely unrelated to:
- Agreeableness
- Conscientiousness
- Emotional Stability

Existing Research Context

Previous Meta-analyses:
- Several past meta-analyses focus on the validity of overall assessment ratings (OAR) and dimension ratings but lacked extensive analysis of AC exercise validity.
- Key studies supporting the criterion-related validity are:
- Collins et al., 2003
- Gaugler et al., 1987
- Hermelin et al., 2007
- Arthur et al., 2003
- Meriac et al., 2008
Issues Identified in Existing Research:
- Limited focus on validity metrics of individual AC exercises despite their importance in predicting AC outcomes.
Calls for clearer psychometric attention to simulation exercises in ACs have been voiced (Lance, 2008; Lievens, 2008).

Framework and Questions Addressed

Framework Introduced: A taxonomy specifying five characteristics of AC exercises, used to code exercise descriptions.
Three Key Questions Addressed:
1. What is the criterion-related validity of AC exercises?
2. How do exercises relate to other constructs in a meaningful way?
3. What situational characteristics do AC exercises simulate?

Exercise Characteristics

Managerial Simulation Exercises: Overview

ACs distinctively observe assessee behavior in moderate- to high-fidelity behavioral simulations, primarily targeting managerial roles.
Six Broad Types of Common Exercises (Thornton, 1992):
- In-basket (IB)
- Cooperative leaderless group discussion (LGD)
- Competitive LGD
- Role-play (RP)
- Oral presentation (OP)
- Case analysis (CA)
AC exercises, although varied, have found a useful framework in past research for categorization.

Characteristics Explained

In-Basket Exercise (IB)

Definition: Simulates paperwork that managers typically face; requires prioritization and resolution of issues.
Context: Designed to mimic complexity inherent in managerial work (Hales, 1986).

Leaderless Group Discussion (LGD)

Definition: Participants resolve problems in unstructured groups, requiring teamwork and decision-making.
Use Cases: Applicable in various settings beyond AC, such as classrooms and team-building workshops.
Note: LGDs can either be cooperative or competitive, but past research lacked sufficient data to distinguish these forms (Bowler & Woehr, 2006).

Role-Play Exercise (RP)

Definition: Participants engage in one-on-one conversations with role players often adopting roles of subordinates or customers.
Purpose: Simulates handling challenging interpersonal issues, key in managerial tasks.

Case Analysis (CA) and Oral Presentation (OP)

Description of Similarities and Differences: Both involve analyzing organizational problems and presenting solutions but differ in response formats (written vs. oral).

Validity Analysis of Assessment Center Exercises

Studies have explored the exercise effect, yet the psychometric properties of exercises remain under-examined (Howard, 2008).
Criterion-related validity of exercise types remains a focus of evaluation; this study highlights a need for comprehensive validity analysis comparing exercises to traditional dimension scores.

Nomological Network and Hypotheses

Constructs considered include personality traits (GMA, Extraversion, Openness) and their expected relationships with performance across exercises.
Hypotheses:
- Hypothesis 1: GMA will correlate with all exercise types.
- Hypothesis 2: GMA will significantly relate to written exercises (IB and CA) more than interpersonally-directed exercises.
- Hypothesis 3: Extraversion and Openness will be more strongly associated with RP, OP, and LGD.
- Hypothesis 4: LGD will demonstrate a stronger correlation with Extraversion than other exercise types.
- Hypothesis 5: Traits indicative of getting along will align with task performance in cooperative exercises (RP and LGD).

Taxonomy of Exercise Characteristics

Five Key Characteristics Identified:
- Complexity
- Structure
- Interpersonal Interaction
- Interdependence
- Fidelity
Methodology: Characteristics coded across a set of 174 exercises yielded insights into the frequency and presence of structure and integrity.
Observations in Coding:*
- Significant lack of reporting on task characteristics across exercises, troublingly affecting the ability to code effectively.

Meta-Analysis Methodology

Literature Search: Surveys conducted across varied databases identifying studies pertinent to AC exercises and their validity.
Inclusion Criteria: Studies were included if they reported correlations between performance metrics and exercise types.
- 49 published and 9 unpublished studies met inclusion criteria, totaling $N = 13,290$ individuals.

Result Analysis Framework

Coders independently examined studies and agreed on exercise types, overall characteristics, and relevant performance metrics.
Correlation Analysis: Adjustments made by correcting for measurement error which is vital for ensuring validity assessments.

Results Interpretation

Criterion-Related Validity

Each exercise type related positively to performance outcomes, ranging from $r = .16$ to $r = .19$ , indicating moderate validity.
Findings on Moderators:
- No significant differences overall in performance between those rated on dimensions vs those rated on exercises.

Summary and Future Directions

Despite relatively lower criterion-related validity observed compared to past studies on dimensions, exercises still display substantive validity as a performance measure in ACs.
Conclusions: Valid and growing consideration indicated for the roles both exercises and dimensions play in AC scoring and interpretations, suggesting future research should continually refine and evaluate nuances within this field.

A Review of the Content, Criterion-Related, and Construct-Related Validity of Assessment Center Exercises

Overview of the Study

Authors: Brian J. Hoffman, Colby L. Kennedy, Alexander C. LoPilato, Elizabeth L. Monahan
Affiliation: University of Georgia; Charles E. Lance (formerly of Organizational Research & Development, Lawrenceville, Georgia)
Purpose: To conduct a comprehensive evaluation of the content, criterion-related, construct, and incremental validity of five distinct types of Assessment Center (AC) exercises. This meta-analysis and qualitative review explicitly aims to address existing gaps in understanding the psychometric properties of individual AC exercises, which are crucial components in personnel selection and development processes.
- Types of AC Exercises Explored:
- In-basket exercises: Specifically designed to simulate administrative tasks, requiring participants to prioritize and resolve various issues typically faced by managers.
- Leaderless group discussions (LGD): Exercises that foster the observation of interpersonal and problem-solving skills as participants interact within a group setting to achieve a common objective.
- Role-play exercises: Focused on assessing interpersonal and communication abilities through simulated one-on-one interactions with other individuals.
- Case analysis exercises: Designed to evaluate analytical, complex problem-solving abilities, and written communication skills by requiring participants to analyze a scenario and propose solutions.
- Oral presentation exercises: Measures communication clarity, persuasion capabilities, and overall presentation skills through formal verbal delivery.

Meta-Analysis Results

Relationships Analyzed: The study meticulously examined several key relationships:
- Among the five AC exercise types themselves, exploring their intercorrelations.
- Between the exercise types and constructs from the Five-Factor Model of Personality (FFM), including Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness to Experience.
- Between the exercise types and General Mental Ability (GMA).
- Between the exercise types and relevant job-performance criterion variables, assessing their predictive power.
Findings: The meta-analysis yielded the following significant results:
- All five types of exercises demonstrated significant positive relationships with various job-related criterion variables, having observed weighted mean correlation coefficients ranging from $r = 0.16$ to $r = 0.19$ . This indicates that AC exercises possess substantive criterion-related validity, predicting job performance to a meaningful extent.
- Exercises were modestly associated with:
- GMA: Suggesting that cognitive ability plays a discernible role in performance across AC exercises, particularly in tasks requiring analytical and problem-solving capabilities.
- Extraversion: Indicating that outgoing, assertive, and socially dominant individuals may perform somewhat better in interpersonally-oriented exercises due to their comfort and proficiency in social interaction.
- Openness to Experience: To a lesser extent, implying that individuals who are intellectually curious, imaginative, and open to new ideas might adapt more effectively to novel and complex situations presented in the exercises.
- Exercises were largely unrelated to:
- Agreeableness: Suggesting that being inherently cooperative, compassionate, or trusting does not consistently predict higher performance across all AC exercise types, contrary to some expectations.
- Conscientiousness: An unexpected finding, as conscientiousness (e.g., being organized, responsible, diligent) is frequently considered a strong predictor of job performance; its weak relationship with AC exercises warrants further investigation into how ACs measure or fail to measure this trait.
- Emotional Stability: Implying that AC exercises may not effectively capture or differentiate aspects related to stress management, emotional resilience, or calmness under pressure.

Existing Research Context

Previous Meta-analyses: While a number of past meta-analyses have focused extensively on the validity of overall assessment ratings (OARs) and specific dimension ratings within Assessment Centers, they have notably lacked an extensive and detailed analysis of the validity metrics of individual AC exercises. This gap is significant because individual exercises are the fundamental building blocks of ACs, and understanding their distinct psychometric properties is crucial for effective design and interpretation.
- Key studies that have generally supported the criterion-related validity of ACs (often focusing on OARs or dimensions) include:
- Collins et al., 2003
- Gaugler et al., 1987
- Hermelin et al., 2007
- Arthur et al., 2003
- Meriac et al., 2008
Issues Identified in Existing Research: A critical limitation identified in prior literature was the restricted focus on the validity metrics of individual AC exercises. This limitation persists despite their recognized importance in shaping overall AC outcomes and providing more granular, diagnostic information about a candidate's specific strengths and weaknesses. This oversight has hindered a nuanced understanding of which specific exercises best predict various aspects of job performance.
Calls for clearer psychometric attention to simulation exercises in ACs have been forcefully voiced by prominent researchers, notably Lance (2008) and Lievens (2008). These calls emphasize the urgent need for more rigorous measurement, validation, and transparency of AC components to enhance their overall utility, fairness, and defensibility in selection and development contexts.

Framework and Questions Addressed

Framework Introduced: This study developed and introduced a novel taxonomy which specifies five intrinsic characteristics of AC exercises. This structured framework was systematically utilized to code detailed descriptions of exercises across various studies, providing a consistent and robust approach to understand their design, underlying demands, and the specific behaviors they are intended to elicit.
Three Key Questions Addressed: This current study was structured around answering three fundamental questions, aiming to clarify the contributions and psychometric properties of AC exercises:
1. What is the criterion-related validity of various AC exercises? This question specifically seeks to quantify how well individual exercises or types of exercises predict actual job performance criteria.
2. How do AC exercises relate to, and load on, other established psychological constructs (e.g., personality traits from the FFM, General Mental Ability) in a meaningful way? This explores their construct validity and integration within a broader nomological network of psychological attributes.
3. What situational characteristics and specific job demands do different AC exercises effectively simulate? This question delves into the content validity and ecological realism of the exercises, examining how well they mimic real-world job contexts.

Exercise Characteristics

Managerial Simulation Exercises: Overview

Assessment Centers (ACs) are uniquely characterized by their emphasis on observing assessee behavior within moderate- to high-fidelity behavioral simulations. These simulations are primarily designed to mimic the complex demands, challenges, and interpersonal dynamics inherent in managerial roles, allowing for direct observation of job-relevant skills and competencies rather than relying solely on indirect measures like self-report questionnaires.
Six Broad Types of Common Exercises (as identified by Thornton, 1992): The field of ACs has traditionally recognized several core exercise types, which form the basis of many assessment programs due to their versatility and utility:
- In-basket (IB): A simulation presenting an accumulation of incoming mail, emails, messages, and reports, requiring immediate managerial action, prioritization, and decision-making.
- Cooperative leaderless group discussion (LGD): Participants work collaboratively to solve a shared problem or reach a consensus without an assigned leader, highlighting teamwork, influence, and negotiation skills.
- Competitive LGD: Participants advocate for individual solutions or distinct roles within a group, often in a context of limited resources, emphasizing assertive communication and persuasive abilities.
- Role-play (RP): Participants interact in a structured, often one-on-one, simulated scenario, typically with an actor portraying a subordinate, customer, or client, to assess interpersonal effectiveness.
- Oral presentation (OP): Requires participants to prepare and deliver a persuasive or informative presentation to an audience (e.g., assessors or a simulated board), followed by a Q&A session.
- Case analysis (CA): Involves participants thoroughly analyzing a complex business problem, often presented through detailed documents, and then developing and recommending optimal solutions, typically in a written format.
AC exercises, although varied in their specific content and structure, have found a useful and robust framework in past research for systematic categorization. This framework facilitates consistent study and comparison of their design and psychometric properties.

Characteristics Explained

In-Basket Exercise (IB)

Definition: The in-basket exercise is a high-fidelity simulation designed to present participants with a series of documents (e.g., emails, memos, reports, calls, letters, voice messages) that an incumbent manager would typically encounter in their daily work. The participant is required to review, prioritize, analyze the information, and formulate decisions and actions (e.g., drafting replies, delegating tasks, scheduling meetings) within a specified and often limited time frame.
Context: This exercise is meticulously designed to mimic the complexity, ambiguity, information overload, and inherent time pressures characteristic of actual managerial work (Hales, 1986). It assesses a range of critical managerial skills such as planning, organizing, delegation, problem-solving, decision-making under pressure, analytical reasoning, and written communication. Effective performance often involves identifying critical issues rapidly, developing clear and executable action plans, and communicating decisions or instructions effectively.

Leaderless Group Discussion (LGD)

Definition: In a Leaderless Group Discussion (LGD), a small group of participants (typically 4-8) is presented with a common problem or task that they must collectively resolve or discuss within a given time limit. Crucially, no leader is pre-assigned to the group, compelling participants to naturally adopt leadership or functional roles, thereby demonstrating their group dynamics, influence tactics, persuasive abilities, and overall interpersonal skills.
Use Cases: LGDs are highly versatile and applicable in numerous settings beyond traditional Assessment Centers. These include academic classrooms for fostering collaborative learning, team-building workshops in organizations, and various organizational training programs aimed at improving group processes, negotiation skills, and conflict resolution. They are effective in assessing skills such as oral communication, persuasion, conflict resolution, active listening, teamwork, initiative, and analytical reasoning within a social context.
Note: While LGDs can typically be categorized as either cooperative (where the group strives for consensus on a shared goal) or competitive (where participants represent differing viewpoints or interests and advocate for their own positions), previous meta-analyses often lacked sufficient differentiated empirical data to consistently distinguish between the psychometric properties and validity of these two forms (Bowler & Woehr, 2006). This study therefore attempts to aggregate data across both forms when specific distinctions were not consistently reported in the reviewed literature.

Role-Play Exercise (RP)

Definition: A role-play exercise involves participants engaging in a direct, often one-on-one, simulated conversation with another individual, typically a trained professional role player. The role player usually adopts a specific persona, such as a subordinate, customer, client, or peer, who presents a challenging or ambiguous interpersonal situation that the participant must address.
Purpose: Role-plays are specifically designed to simulate and assess how participants handle complex and challenging interpersonal issues that are central to a wide array of managerial tasks and professional interactions. They are effective measures of skills such as active listening, empathy, negotiation, conflict management, coaching, providing constructive feedback, building rapport, and interpersonal influence.

Case Analysis (CA) and Oral Presentation (OP)

Description of Similarities and Differences: Both Case Analysis (CA) and Oral Presentation (OP) exercises are highly effective tools for evaluating higher-order cognitive abilities, analytical prowess, and communication skills. They commonly involve participants being presented with a complex organizational problem, often detailed as an extensive business case, which they must then thoroughly analyze to develop and propose practical and well-justified solutions.
- Similarities: Both exercise types fundamentally require strong analytical reasoning skills, effective problem identification, robust solution generation, and strategic thinking to address the presented challenges.
- Differences: The primary distinction between CA and OP exercises lies in their required response formats:
- Case Analysis (CA): Typically requires participants to produce a detailed written report, a structured memo, or a comprehensive proposal outlining their in-depth analysis of the case, recommended solutions, and a clear justification for those recommendations. This format places a strong emphasis on written communication clarity, logical structure, persuasive argumentation, and the ability to synthesize complex information effectively in writing.
- Oral Presentation (OP): Requires participants to verbally present their analysis, findings, and recommendations to an audience (e.g., assessors acting as a simulated board of directors or management team). This presentation is often followed by a question-and-answer session, where participants must defend their proposals. This format specifically highlights skills in public speaking, verbal persuasion, composure under spotlight, visual aid integration, and the ability to think critically and respond adeptly on one's feet.

Validity Analysis of Assessment Center Exercises

Studies have consistently explored the 'exercise effect' (the notion that different exercises measure different latent constructs to varying degrees), yet the precise psychometric properties and the construct validity of individual AC exercises remain comparatively under-examined in the literature (Howard, 2008). This gap represents a significant limitation in fully understanding AC efficacy.
The criterion-related validity of distinctive exercise types continues to be a central focus of evaluation in AC research. This particular study highlights a persistent need for more comprehensive validity analysis, explicitly comparing the predictive utility of individual exercises to that of more traditional dimension scores or overall assessment ratings derived from ACs.

Nomological Network and Hypotheses

Constructs Considered: The study actively considered a detailed nomological network, integrating several key psychological constructs. These included General Mental Ability (GMA), and specific personality traits from the Five-Factor Model (FFM) such as Extraversion, Agreeableness, Conscientiousness, and Openness to Experience. The research aimed to establish their expected theoretical relationships with performance across the various AC exercises.
Hypotheses: The study formulated five specific hypotheses regarding the associations between these constructs and AC exercise performance:
- Hypothesis 1: General Mental Ability (GMA) will correlate positively and significantly with performance across all types of AC exercises, reflecting the cognitive demands inherent in most assessment tasks.
- Hypothesis 2: GMA will demonstrate a significantly stronger relationship (correlation) with performance in written exercises, specifically In-basket (IB) and Case Analysis (CA), compared to interpersonally-directed exercises such as Role-Play (RP), Oral Presentation (OP), and Leaderless Group Discussions (LGD). This is due to the higher analytical and problem-solving demands of written tasks.
- Hypothesis 3: Personality traits of Extraversion and Openness to Experience will be more strongly associated with performance in interpersonally-oriented and novel exercises, particularly Role-Play (RP), Oral Presentation (OP), and Leaderless Group Discussions (LGD). Extraversion facilitates social interaction, and Openness assists with adapting to new situations.
- Hypothesis 4: Leaderless Group Discussion (LGD) exercises will demonstrate a particularly stronger correlation with Extraversion compared to all other exercise types, given the inherent social dominance and verbal participation required in group settings.
- Hypothesis 5: Personality traits indicative of 'getting along' with others (specifically Agreeableness, and potentially Conscientiousness related to cooperation) will align positively with task performance in cooperative exercises, such as Role-Play (RP) and cooperative forms of Leaderless Group Discussions (LGD). This suggests that interpersonal harmony could facilitate success in these contexts.

Taxonomy of Exercise Characteristics

Five Key Characteristics Identified: A cornerstone of this research was the development and application of a taxonomy identifying five crucial characteristics that define and differentiate AC exercises. These characteristics provide a structured way to understand the underlying demands and design principles of each exercise:
- Complexity: The degree of intricacy, number of variables, and ambiguity present in the task.
- Structure: The extent to which the task environment and rules are clearly defined and predictable.
- Interpersonal Interaction: The requirement for participants to engage in direct communication and social exchange with others.
- Interdependence: The degree to which a participant's performance relies on or influences the actions of others.
- Fidelity: The extent to which the exercise realistically simulates the actual job tasks and environment.
Methodology: These five characteristics were systematically coded across a substantial set of 174 AC exercises described in the reviewed literature. This coding process yielded valuable insights into the frequency, nature, and observed presence of each characteristic (e.g., structure and interpersonal integrity) within various exercise designs.
Observations in Coding:** A significant challenge and notable observation during the coding process was the disconcerting lack of consistent and detailed reporting on these task characteristics across many of the included studies. This limited the ability to code effectively and comprehensively, highlighting a gap in research reporting standards that hinders cumulative science.

Meta-Analysis Methodology

Literature Search: A rigorous and extensive literature search was conducted across a variety of academic databases and professional archives. This comprehensive survey aimed to identify all studies pertinent to Assessment Center exercises and their various validity metrics.
Inclusion Criteria: Studies were rigorously included in the meta-analysis if they met specific criteria, primarily requiring the reporting of relevant correlation coefficients between performance metrics (e.g., exercise scores) and defined exercise types. This ensured that only quantitative data suitable for meta-analytic aggregation was utilized.
- A total of 49 published studies and an additional 9 unpublished studies successfully met the stringent inclusion criteria, cumulating in a robust sample size that encompassed data from $N = 13,290$ individuals.

Result Analysis Framework

Coders independently examined the selected studies, meticulously extracting and agreeing upon the specific exercise types, their overall characteristics, and the relevant performance metrics utilized. This independent coding ensured reliability and accuracy of data extraction.
Correlation Analysis: All correlation coefficients extracted from the studies underwent critical adjustments. These adjustments were primarily aimed at correcting for statistical artifacts such as measurement error, which is vitally important for ensuring the accuracy and robustness of the validity assessments. Meta-analytic techniques were then applied to pool these corrected correlations.

Results Interpretation

Criterion-Related Validity

Consistent with hypotheses, each exercise type was found to relate positively and significantly to relevant performance outcomes, with corrected mean correlation coefficients ranging from $r = 0.16$ to $r = 0.19$ . This consistently demonstrates that individual AC exercises possess moderate yet substantive criterion-related validity, meaning they are meaningful predictors of job performance.
Findings on Moderators: The analysis also investigated potential moderating effects. Notably, there were no significant overall differences observed in the strength of performance prediction between studies where participants were rated on specific dimensions (e.g., leadership, communication) versus those where ratings were based directly on overall exercise performance. This suggests that the format of rating (dimension-based vs. exercise-based) may not substantially alter the overall predictive validity of the AC exercises themselves.

Summary and Future Directions

Despite the relatively lower criterion-related validity observed for individual AC exercises compared to some past meta-analyses focusing on broader AC dimensions or overall assessment ratings, the exercises still consistently display substantive and significant validity as distinct performance measures within Assessment Centers.
Conclusions: The findings lead to the conclusion that both exercises and dimensions play valid and growingly considered roles in AC scoring and interpretations. The study strongly suggests that future research should continually refine and dynamically evaluate the nuances within this complex field. This includes further exploration of how exercise design characteristics, assessor training, and different rating methodologies impact the psychometric properties and predictive power of AC components.