EB

Week 11

What Factors have Influenced Test Development?

Theoretical Developments:
  • Intelligence: What have we learned from Binet?

  • Personality: How has it expanded?

Technical and Methodological Developments:
  • Statistics: Factor analysis.

  • Computers and the Internet: Experience sampling and self-monitoring.

Contextual Developments:
  • Political: The impact of World Wars.

  • Funding and Policy: Educational testing.

Influences on the Future of Testing?

Content Developments:
  • Expanded coverage of intelligence testing and specific cognitive tests.

  • New aspects of personality not yet examined.

  • Increased emphasis on positive psychology.

Technical and Methodological Developments:
  • New ways to measure psychological states and personality traits.

Contextual Developments:
  • Social phenomena growing up digitally.

  • Political/War

Content Developments

  • A construct is a hypothetical entity with theoretical links to other hypothesised variables, proposed to relate to a consistent set of observable behaviours, thoughts, or feelings that is the target of a psychological test.

  • Theoretical advances, such as new constructs emerging in the literature, might give an idea of future tests and procedures likely to be developed.

  • The Big Five shaped the development of a number of assessment measures.

  • Intelligence theories have also shaped intelligence testing.

  • Emotional intelligence refers to a person’s capacity to monitor and manage emotions, understand the emotions of others, and use these insights to function better interpersonally.

Theoretical Issues

  • Where to locate this in existing theory? Amalgamation of existing personality traits?

  • Is it a series of learned skills? Can interventions increase emotional intelligence or is it fixed?

  • Can you trust self-reports about emotional intelligence or must it actually be measured (objective)?

Integrity

  • Integrity involves dependability, theft proneness, and counterproductive work behaviour.

  • It may require specific personality tests or direct measures to assess a job applicant's honesty, trustworthiness, or integrity.

  • These assessments are strongly prone to social desirability bias and superlative response styles (i.e., claiming extreme virtue).

  • Unobtrusive measures such as typing speed, mouse clicks, and eye tracking can be utilised.

Sexual Harassment and Misconduct

  • Current research focuses on attitudes and behaviours towards women and minority groups, expecting an explosion of new research (e.g., incel subgroups, non-consensual image sharing, deepfaking).

Technical and Methodological Developments

  • Increasing access to computers and the internet over time has facilitated:

    • Computer-assisted psychological assessment (CAPA)

    • Use of smartphones for behavioural assessment

    • Smart testing techniques

    • Computerised and multidimensional adaptive testing

    • Time-parameterised testing

    • Latent factor-centred designs

    • Internet testing with non-obtrusive measurements

    • Potential for virtual reality and artificial intelligence in assessment

Computer Applications

  • 1950s: Computers first available for testing and assessment, with CAT conceived and new developments in test theory like item response theory emerging, though costs and skill levels limited mainstream use.

  • 1980s: Proliferation of affordable home computers allowed access to computing power for test developers.

  • 1990s: Growth of the internet presented opportunities for internet testing and rapid proliferation of tests.

  • 2000s: Introduction of smartphones, enabling user access equivalent to desktop computing in a portable format, along with online surveys.

  • 2010s: Widespread adoption of tablets facilitated cheap, accessible information and easier participant recruitment, reducing research costs.

Are Computer and Pen and Paper Forms Equivalent?

  • Does computer presentation fundamentally change the construct being measured?

    • Generally, the answer is no, with cross-mode correlations of .97 (e.g., Mead & Drasgow, 1993 meta-analysis).

    • Not much difference observed between ticking a box on a questionnaire with a pencil or a mouse, as psychological decision-making processes remain the same.

    • However, reliability tends to be poorer, especially with attentional distractions from modern lifestyle factors (e.g., Netflix, multitasking).

    • Less rapport may affect motivation, especially during long surveys.

Speeded Tests

  • Exception noted, where speeded tests characterised by simple, quickly performed tasks show variation by response modality (i.e., pen and pencil vs computer) affecting results, (e.g., cross-mode correlation of 0.72).

    • Notably, gender differences in fine-motor skills may also impact results.

Smart Testing

  • Kyllonen (1997) speculated about the future of testing with the development of a “smart test” focusing on ability testing, incorporating significant technologies associated with abilities measurement, including:

    • Computer delivery

    • Multidimensional adaptive technology

    • Time-parameterised testing

    • Latent factor-centred designs

Multidimensional Adaptive Testing (MAT)

  • MAT extends Computerised Adaptive Testing (CAT), applying the same adaptive testing principles to a battery of tests rather than a single test.

  • Recognises correlations between constructs measured, with cross-battery assessments capitalising on correlations across different types of tests.

  • Performance on each item informs items for every subtest in a battery, adapting simultaneously, significantly reducing overall test time without sacrificing measurement accuracy.

Other MAT Examples

  • The Progress in International Student Assessment (PISA) utilises the MAT technique to measure:

    • Reading literacy

    • Mathematics literacy

    • Science literacy (with financial literacy added recently)

  • NAPLAN Online aims to incorporate MAT techniques for measuring educational outcomes.

Limitations of MAT

  • Similar to CAT, requires much effort to develop a sufficiently large item bank, needing hundreds of items with parameter estimation.

  • Data from large samples of examinees with extensive testing is necessary, with more effort demanded than in CAT.

  • Users may find it confusing due to potential changes between item types, necessitating recall of instructions across subtests, which may be unrealistic for children.

Already Some Existing Intelligence Tests

  • Examples include Multidimensional Aptitude Battery - II (MAB-II).

Time Parameterisation

  • A tension exists between speed and accuracy, potentially sacrificing one for the other, which complicates scoring and interpretation.

  • Computer-administered tests can capture response time, linking to the Implicit Association Test (IAT), an indirect measure of implicit beliefs, prejudices, and biases utilising reaction time.

Latent Factor-Centred Design

  • Arguments arise to emphasise constructs measured rather than focusing solely on the specifics of the test lowed by traditional methods.

  • A construct focus may reveal new testing forms, including virtual reality, role play, and games that assess while engaging participants, shifting interest towards the latent factors underlying performance.

Implicit Association Test (IAT)

  • The IAT offers indirect measurement advantages, reducing susceptibility to socially desirable response styles.

  • While showing moderately high reliability, it poses questions about legitimacy for individual assessment, particularly for individuals with impaired motor skills.

Implicit Association Test - Australia

  • Shirodkar (2019) indicates biases against Indigenous Australians, analyzing target concepts (e.g., b/w images of faces from in- and out-group) and attribute concepts (positive/negative terms) using samples that are largely overrepresented by Caucasians and highly educated individuals.

Internet Testing

  • Has revolutionised the field, primarily impacting distribution over test development.

  • It allows for rapid circulation and updates of questions among psychologists.

  • Internet tests can be easily modified, facilitating dynamic norming potential.

Risks and Limitations of Internet Testing

  • A digital divide results in limited access to the internet based on socio-economic status continues to present major discrimination challenges in testing.

  • Security concerns surrounding highly sensitive information collected.

  • The integrity of tests may be compromised by rapid dissemination, online security threats, and the prevalence of non-evidence-based assessments in public domains.

Need for Supervised Testing in the Digital Age

  • Functions of supervision include authenticating test-takers, establishing rapport, ensuring adherence to administration standards, preventing cheating, and securing test integrity.

Levels of Supervision

  • Open: Unsupervised, published online or in print for personal development (low-stakes testing).

  • Controlled: Password-protected, suitable for first steps in recruitment.

  • Supervised: Proctored, ensuring compliance with testing standards.

  • Managed: Secure conditions with extensive supervision (remote or local).

Technology Development

  • Future potential for various innovative technologies in assessment includes virtual reality, artificial intelligence, holograms, serious games, eye-tracking, mobile devices, and wearables.

Virtual Reality for Assessment

  • Previously impractical, advancements have made VR more accessible.

  • Incentives for situational judgement tests, role play, and therapeutic applications (e.g., phobia treatments).

Virtual Reality Still Faces Challenges

  • VR's efficacy assumes accessibility across demographics, while the prevalence of cyber sickness poses significant feasibility issues.

Believability of Actions and Acceptance by Users

  • More research is necessary to ascertain efficacy in personality assessments and translate online interactions to offline behavior. Furthermore, the uncanny valley effect hampers newer technologies.

Artificial Intelligence

  • Progress made with AI in visual perception tasks, but natural language processing remains problematic, alongside major ethical concerns.

  • Historical context with AI, notably Eliza, marks the early development of chatbot technology.

Other Types of Technological Developments

  • Holograms and Augmented Reality: Early-stage feasibility exists, albeit with limitations for those experiencing cyber sickness.

Serious Games

  • Games designed beyond entertainment serve as assessment tools for promoting personal development and behaviour modification.

Eye Tracking

  • Offers non-obtrusive ways to assess attention and learning strategies, utilising various formats including a stationary mounted display.

Mobile Phones and Wearables

  • Devices for recording, GPS tracking, and applications facilitate various forms of assessments securely and effectively.

Emotional State Recognition and Biofeedback

  • Advancements enable accurate recognition of emotional states through intelligent software aiding in personal adjustment and health monitoring.

Contextual Changes

  • The broader social environment influences assessment development, necessitating increased accountability and transparency alongside ethical considerations.

  • Technology offers both challenges and opportunities, underscoring the importance of ethical conduct and critical reasoning toward future testing methodologies.