A Framework for Test Measurement Selection in Athlete Physical Preparation

Introduction to Athlete Physical Preparation and Test Selection

Core Challenge: Preparing athletes for competition necessitates the accurate diagnosis and consistent monitoring of essential physical qualities such as strength, power, speed, and endurance characteristics.
Complication in Test Selection: The process of selecting appropriate tests to measure these physical attributes is fundamental to effective training but is significantly complicated by the vast array of tests and measurement tools currently available.
Article's Purpose: This article introduces an evidence-based method to streamline test measurement selection for athlete physical preparation.
- It outlines a strategy to integrate various layers of validity, thereby linking specific test measurements directly to competition outcomes.
- It then presents a comprehensive framework designed to assess the suitability of test measurements, building upon contemporary validity theory and factoring in technical, decision-making, and organizational considerations.
- Practical examples are provided to illustrate the framework's utility across diverse athletic settings.
Goal of the Framework: The proposed systems aim to simplify the extensive range of available measurements, identifying those most likely to critically enhance competition performance.
Need for a Systematic Framework: Due to the high volume of tests and metrics, practitioners (including strength and conditioning coaches, sport scientists, performance analysts, and applied researchers) require a systematic decision-making framework to select context-specific physical testing measures that effectively guide athlete preparation.
Foundation in Validity Theory: The conceptualization of validity as a network of inferences (3) and insights from measurement, organizational, and decision-making perspectives (4) offer crucial guidelines for navigating measurement selection. However, a dedicated solution within athlete assessment models has been lacking, which this article addresses by developing a framework based on a contemporary understanding of validity.

Validity in Sport Performance

Central Role of Validity: Validity is paramount in test measurement selection, as it determines the extent to which a test or measure successfully achieves its intended goals.
Evolution of Validity Theory:
- Traditional View: Historically, validity was categorized into three primary forms: content, criterion, and construct validity (5).
- Modern Unified Theory: Over the last 50 years, validity definitions and types proliferated (6). The modern approach has converged into a unified theory, which deemphasizes distinguishing numerous validity types in favor of focusing on the original forms and accumulating evidence for or against a validity proposition within a specific situational context (4).
Context-Specific Application in Sports: In sports performance, the choice of validity forms used to ascertain a test measurement's relevance depends heavily on the specific sport and performance context.
- Measured Sports (e.g., timed, weight, distance): For sports where deterministic models can be generated (7), criterion validity is often directly and easily applicable to establish a link to performance.
  - Example: Measuring barbell velocity during a power clean at a fixed load can accurately predict weightlifting performance in competition.
- Non-deterministic or Complex Sports (e.g., team, field, court, combat sports): Most sports fall into this category, where competition outcomes are the result of many interacting factors (8). In these cases, directly assessing the criterion validity of a single physical performance metric is not straightforward.
  - Content Validity: Can relate to how well a battery of measurements quantifies the physiological characteristics critical for sport performance (9). A crucial step is determining how physical qualities contribute to outcomes and identifying the most relevant qualities in context.
  - Construct Validity: Becomes the key analytical lens. Its objective is to establish the relevance of a physical quality to sport performance.
  - Representation for Complexity: For complex scenarios, construct validity of physical performance metrics can be conceived as a sequence of inferences that connect a test score to the desired performance outcome.

Layers of Validity for Complex Relationships

Addressing Complexity: The concept of a network of inferences (3), or "layers of validity," provides a mechanism to navigate the intricate process of establishing measurement validity within physical preparation (Figure 1).
Indirect Relationships: This approach acknowledges that it may be impractical to establish a direct or causal link between a physical fitness measurement and successful competition performance outcomes (constructs), primarily due to the inherent complexity or abstract nature of sport competition.
Intermediate Steps/Proxies: Instead, the relationship between physical performance measurements and competition outcomes can be evaluated through intermediate steps or proxies that eventually connect to the ultimate competition performance outcome.
- Rugby League Example: It might be difficult to determine if a higher 1-RM back squat score directly causes more wins in professional rugby league. However, it can be associated with within-game measures of tackling ability (10), which are, in turn, distinguishing characteristics of higher- versus lower-tier match-play (11).
- Professional vs. Semi-professional Distinction: Furthermore, absolute 1-RM back performance has been shown to differentiate professional from semi-professional rugby league players more effectively than alternative metrics (12) (Figure 2).
Bridging the Gap: In such instances, multiple layers of evidence are employed to bridge the gap between specific physical performance measurements and the broader construct of successful competition performance outcomes.
Process of Establishing Layers: This involves breaking down the relationship between measurements and competition outcomes into a series of smaller, more manageable relationships or steps, each requiring its own validation evidence.
- By validating these intermediate steps, the overall validity of the measure concerning the intended construct can be indirectly supported.
- Increased Complexity: As the number of intermediate steps between the metric and the competition outcome increases, so does the complexity of the validity argument (13).
- Evidence Accumulation: This multi-layered process facilitates the accumulation of evidence supporting the relationships between a measure and the sport performance outcome (14).
Guiding Analytical Approach: While a multi-layered validity model may vary by context, its core objective is to offer a schematic representation of measurement-performance interaction that guides the analytical strategy for constructing a validity argument. Therefore, a complementary framework is essential to assess the strength of connection between these layers.

Physical Assessment Framework

Framework Objective: The primary goal of this framework is to evaluate the capacity of a specific test measurement to effectively link the various steps within the multi-layered validity model.
Informed by Contemporary Validity Theory: The elements incorporated into the framework are derived from contemporary validity theory (4).
Three Primary Criteria: The framework considers three main criteria:
- (i) Measurement Criteria (Table 1): Assesses how a metric is collected, the quality and type of evidence supporting its links to performance outcomes, and the uniqueness of the information it provides (5).
- (ii) Decision-making Criteria (Table 2): Examines the degree to which actionable decisions can be derived from the measurement (15).
- (iii) Organizational Criteria (Table 3): Accounts for the practical feasibility of implementing the test and measurement within a given sport organization (16).
Secondary Criteria: In addition to the primary objectives, secondary criteria are included to anticipate future positive or negative side-effects or consequences of using the measure that extend beyond its immediate primary objectives (4).
Influencing Models: The framework integrates insights from previous models developed for guiding the implementation of sports technology (17, 18) and jump assessments (19) across different contexts.
Systematic Decision Support: Collectively, the proposed framework offers a systematic decision support process for sports practitioners, integrating both the technical requirements and organizational considerations necessary for effective measurement selection in athlete physical preparation.

Measurement Criteria

Measurement Collection

Validity and Reliability: A metric must be evaluated within the context of its collection method; thus, the validity and reliability of this method are crucial (20, 21).
- Example: Linear position transducers are recognized as a more valid and reliable tool for measuring barbell velocity during resistance training compared to accelerometer-based devices (22).
Utility for Decision Making: Metrics lacking evidence of reliability have limited value for decision-making within a validity model. This is because observed variation might originate from external sources, such as measurement error, inconsistencies in the person conducting the test, or fluctuations in the testing environment (23).
Contextualizing Changes: Knowledge of a test's measurement error is vital for properly contextualizing the magnitude of any observed changes and to prevent over-interpreting individual changes that are small relative to this error.
Informed Interventions: Without strong validity, it becomes challenging to understand the precise mechanisms linking different layers in the validity model, which impedes the design of targeted and effective interventions.
- Example: An invalid test of change of direction ability might still correlate with performance outcomes if its variance primarily reflects another physical property like speed (24). However, this could lead to misdirected training interventions targeting the wrong physical qualities.
Technological Advancements: This aspect has become particularly critical given rapid technological advancements and the saturation of the market with various measurement devices in sport (17).
Scale for Evaluation: Low/Moderate/Strong.

Evidence of Association

Multi-faceted Assessment: Evaluating the strength of evidence supporting a link between a test metric and a performance outcome is complex and often challenging (25).
Ideal Support: Ideally, a connection should be underpinned by both a plausible mechanistic explanation and supporting experimental or generalizable observational data (26).
- Mechanistic Explanations: Explanations based on domain knowledge of physiology and sport performance are essential for establishing causal links and for interpreting published data effectively (26, 27). Considerations of causality are fundamental for designing and implementing training interventions.
- Example: A study on recreational and untrained cyclists reported a strong correlation (r = 0.965) between maximum power in an incremental test (cycling to exhaustion, starting at 100 ext{ W} and increasing by 20 ext{ W} ext{min}^{-1}) and functional threshold power (the maximum average power maintainable over a 1 ext{ h} period) (28). Both values significantly increased after a training intervention, assuring practitioners that incremental test improvements would translate to performance gains in sub-elite cyclists.
- Generalizability Limitations: It is important to note that the generalizability of such findings to elite cohorts is not guaranteed, as the trainability of underlying physical qualities may differ. This underscores the importance of appraising evidence in relation to the specific cohort under consideration.
Empirical Evidence Hierarchy: The body of empirical evidence should be appraised within the traditional sources of evidence hierarchy (29), emphasizing the magnitude of effects interpreted within the context of the desired outcome (30).
Study Population Characteristics: Preference should be given to metrics for which associations have been demonstrated within the same sport specialization and level of proficiency as the target athlete group.
Scale for Evaluation: Low/Moderate/Strong.

Nature of Association

Informing Model Construction: A deeper understanding of how a metric relates to a performance outcome, beyond just the existence of an association, is crucial for constructing an effective multi-layered validity model.
Complex Relationships: Evidence should be examined for nonlinear relationships, as well as interacting (31), mediating (32, 33), or moderating (34) effects.
Analytical Techniques: While techniques like machine learning algorithms can uncover non-linearity and complexity in metrics, their use does not diminish the need for plausible mechanistic and causal explanations (35).
Examples of Complex Phenomena: Understanding these complex phenomena can greatly assist strength and conditioning coaches:
- Tipping Points: e.g., the threshold between functional and non-functional overreaching.
- Feedback Loops: e.g., how an increase in maximal strength can enhance adaptations to power training.
- Emergence: e.g., a live tester attacker vs. defender test discriminating between higher and lower-level players, while a preplanned change of direction test cannot (36).
Guidance for Practitioners: Awareness of these phenomena (further reading in (36)) coupled with appropriate analytical tools is beneficial for selecting valid physical performance metrics within team sport environments.
Scale for Evaluation: Low/Moderate/Strong.

Independence of Information

Dimensionality Reduction: In scenarios where multiple metrics demonstrate comparable measurement validity, reliability, and association with performance, dimensionality reduction methods can aid in selection (37, 38).
Prioritizing Independent Metrics: Metrics that exhibit lower co-linearity (i.e., less redundancy) with other proposed metrics in the multi-layered validity model should be prioritized.
Desirable Quality: This independence is highly desirable because it allows for clearer resolution in assessing an athlete's physical abilities, as each metric then represents a distinct quality (1).
Scale for Evaluation: No/Somewhat/Yes.

Decision-Making Criteria

Actionable Information: Decision-making objectives focus on the extent to which a user can confidently act upon the information provided by a measurement (4, 39) (Table 2).
Key Feature: Interpretability: This refers to the degree of meaningful insights detectable from a physical performance metric, or how a change in the metric's value can be used by coaches, sport science staff, and broader organizational stakeholders to inform decision-making (21).
- Decisions made without proper interpretation are more prone to errors and unintended consequences.
- Test measurements should align with specific objectives (e.g., performance enhancement, talent identification, injury prevention) and be capable of guiding the development of physical training programs.
- The influence of specific decision-making factors may vary depending on the particular aspect of the physical preparation plan in focus.
- To enhance interpretability, normalizing data within a team or benchmarking against competitors/comparable sports provides contextual reference, improving data utility (1, 40).
Scale for Interpretability: Low/Moderate/Strong.

Responsiveness

Detecting Real Changes: Responsiveness assesses a measurement's accuracy in detecting meaningful changes in response to a defined stimulus over time (41, 42).
- In athlete physical preparation, this stimulus typically involves training interventions but can also arise from factors like fatigue, adaptations from other training activities, or competition itself.
Practitioner Questions: Practitioners should ask: Can this measure effectively inform training interventions? Is the attribute being measured modifiable through training?
Scale for Evaluation: Low/Moderate/Strong.

Diminishing Returns

Adaptation Rate: This principle observes that as an individual receives more exposure to a particular stimulus, their rate of adaptation generally diminishes. Consequently, alternative training stimuli are required to achieve continuous physical performance gains (43–45).
Changing Metric Relevance: As an athlete's training progresses, the relevance of a specific physical performance metric may decrease due to a plateau in potential adaptation within one physical domain. Simultaneously, another physical performance metric might gain prominence because it offers greater potential for further physical performance improvement.
Scale for Evaluation: Low/Moderate/Strong.

Organizational Criteria

Feasibility Assessment: Organizational objectives (Table 3) pertain to the practical feasibility of implementing a physical performance testing and analysis process within the sports organization. Feasibility assessments are commonly used in various business and management contexts to weigh the costs against the benefits of strategies, processes, and programs (46).
Costing Considerations: When selecting physical performance metrics, crucial costing factors include:
- Financial Cost: Any monetary expenses associated with the measurements. This encompasses, but is not limited to, equipment purchase and maintenance, venue costs (e.g., utilities, internet), and staff wages for data collection, processing, analysis, and reporting. These costs must align with the budgetary constraints of the sports organization or governing bodies (e.g., salary caps, government funding).
  - Scale: Low cost, moderate cost, high cost.
- Opportunity Cost: Resources that must be reallocated or sacrificed from other areas of the sport program to implement a specific physical performance metric. This can include reallocating financial resources from other departments (e.g., nutrition, sport psychology, analytics) or sacrificing training time (e.g., match simulations, skills sessions, sport-specific training) to accommodate testing sessions.
  - Scale: Low cost, moderate cost, high cost.
- Time Cost of Test Familiarization: This considers the complexity of the test during implementation and administration. Highly complex performance tests may demand greater expertise, incurring time costs for upskilling staff (e.g., on equipment, software, pass/fail criteria) and familiarizing athletes (e.g., pacing strategies, coordination of multi-joint movements).
  - Scale: Low cost, moderate cost, high cost.
- Time-Cost of Implementing in the Training Environment: This involves the timing and duration of physical performance testing within broader training schedules (e.g., session, week, annual, or biannual plans). It also includes recovery time for athletes to mitigate injury risks or overtraining from residual fatigue between tests, testing days, and regular training days.
  - Scale: Low cost, moderate cost, high cost.
Cost-Benefit Evaluation: A feasibility evaluation allows practitioners and their organizations to quantify the value (e.g., for training intervention, talent identification, injury prevention), financial costs, time costs, physical risks vs. benefits, and identification of redundant metrics (21, 48).

Secondary Criteria

Impacts Beyond Primary Objectives: Secondary criteria represent the broader impacts that occur following the implementation of a measure, extending beyond its immediate, primary objectives (4).
Examples of Secondary Impacts: These can include factors such as:
- Athlete motivation.
- Feedback quality.
- Teamwork dynamics.
- Competitive behaviors.
Wider Organizational/Sport Impact: Secondary objectives also consider whether incorporating a physical performance metric has a broader influence on the organization or the sport itself, beyond just the performance department. This could involve its ability to foster developmental pathways or support grassroots sport initiatives.
Negative Consequences: Potential negative impacts are also considered. For example, a secondary objective assessment might identify scenarios where a physical performance metric could be misused, such as its application to inappropriate sub-populations.
Financial and Ethical Implications: Errors leading to incorrect decision-making can be costly for sports organizations, resulting in resource wastage, impacting financial stability, and potentially imposing budgetary constraints on other departments. Secondary impacts can therefore range from internal financial consequences within an organization to wider social or sports governance implications that are difficult to quantify precisely.

Framework Application

Flexible Utility: The proposed physical assessment framework is designed for adaptable application across diverse sport contexts.
Checklist Format: At its most general level, the framework can serve as a checklist for comparing multiple measurements under consideration for implementation (18) (Figure 3A).
- Gatekeeper Question: This application typically starts with a crucial "gatekeeper question": "Is there evidence of association between the measure and another 'layer' in the validity model?"
- Conditional Value: Without an affirmative response to this gatekeeper question, the subsequent items and objectives in the checklist hold little value.
- This provides a broad, non-prescriptive model suitable for a variety of situations.
Tailored and Weighted Application: The framework can also be customized to meet the specific requirements of a physical preparation program, a particular training environment, a sports organization, or a governing body. This customization involves assigning different weights to items within each objective category, potentially through multiple rounds of screening (18) (Figure 3B).
- Two-Round Screening Example: In the provided example, a potential measurement must successfully pass a primary round of item and objective screening before it receives further consideration.
- The secondary round of screening might not mandate that all selected items be met prior to the metric's implementation. Instead, this secondary round could identify any limitations or constraints of the physical performance metric, allowing the end-user to make an informed decision about its use within their training program or organization.

Conclusion

Evidenced-Based Process: This article introduced evidence-based processes for selecting test measurements in athlete physical preparation.
Methodology: It detailed a method for constructing a multi-layered validity model.
Evaluation Framework: It then presented a framework for evaluating the suitability of metrics, taking into account technical, decision-making, and organizational factors.
Versatile Application: The framework is versatile, applicable in formats ranging from a simple checklist to a gatekeeper model, depending on user needs.
Impact on Performance: The systems outlined in this article are designed to simplify the vast array of available measurements, helping to identify those that will most effectively contribute to competition performance.