18: Metacognition of Testing Effect

Metacognition of the Testing Effect: Guiding Learners to Predict the Benefits of Retrieval (Tullis, Finley, & Benjamin, 2013)

Self-Regulated Learning

  • Substantial amounts of learning happen outside of the classroom, requiring students to regulate their own processes.

    • Key areas of self-regulation include:

    • Allocating study time: Deciding how much time to dedicate to studying.

    • Selecting items for additional study: Choosing which concepts or materials need more focus.

    • Monitoring learning: Keeping track of one's understanding and retention of information.

  • Some students report using self-testing methods as a study technique.

  • Research questions:

    • Are learners sensitive to the mnemonic benefits of testing?

    • Can learners make predictions based on mnemonic cues rather than naïve theories?

Accuracy of Metacognition

  • Various desirable difficulties in learning do not appear to be reflected in Judgments of Learning (JOLs):

    • Spacing repetitions: The effect of distributing study sessions over time rather than cramming.

    • Interleaved practice: Mixing different topics or types of problems during study sessions.

    • Imagery: Using mental images to enhance memory retention.

    • Release from proactive interference: The phenomenon where previously learned material hinders the learning of new information, which can be minimized under certain conditions.

  • Conditions for improving the accuracy of JOLs include:

    • Opportunity for comparison: Utilizing within-subject or list manipulation.

    • Item-by-item judgments: Making predictions based on individual items rather than overall performance.

    • Delay between study and JOL: Allowing a time gap can improve metacognitive reflection.

    • Avoid incomplete information: Ensuring that all information is available when forming JOLs.

Experiment 1 Overview

  • Previous studies elicited JOLs for the testing effect immediately after study, using between-subjects or block manipulations, and focused on aggregate/global JOLs.

  • Findings from prior work indicate:

    • Re-study can result in better immediate memory performance compared to testing, but leads to worse performance in delayed assessments.

  • Current study methods include:

    • Soliciting JOLs immediately post-study and after a delay.

    • Manipulating re-study versus test conditions within a single list.

    • Encouraging item-by-item JOLs and comparing cue-only and cue-target JOL conditions.

Cue-Only JOL Condition vs. Cue-Target JOL Condition

  • Memory for past tests (Finn & Metcalfe, 2007):

    • Predictions about upcoming test performance can sometimes be overly confident or well-calibrated initially but may later lead to underconfidence on subsequent trials, known as the Underconfidence with Practice (UWP) effect.

    • UWP effects have been observed:

    • In both recalled and unrecalled items from Trial 1.

    • Across fixed and self-paced study times.

    • With incentives for accuracy, and in both easy and hard materials.

  • Heuristic use in making immediate JOLs following a test involves:

    • High JOL for correctly recalled items and low JOL for incorrect items without accounting for potential learning from restudying.

    • The Monitoring Past Test (MPT) may not be utilized in delayed JOLs.

    • UWP effects were shown for both recalled and unrecalled items.

  • Procedure from Finn & Metcalfe (2007, E1):

    • Study 48 pairs of words, make JOLs after each item (within 10 minutes).

    • Conduct cued-recall test (3 minutes), repeat the procedure.

    • During the delayed condition, study 48 pairs again for 2.5 minutes, followed by cues for delayed JOLs (10 minutes) and again conduct cued-recall tests (3 minutes).

  • Results from Finn & Metcalfe (2007, E1):

    • Immediate vs. Delayed recall and JOLs measurements include:

    • Trial 1 Recall: Immediate: .22, Delayed: .11

    • Trial 2 Recall: Immediate: .40, Delayed: .31

    • Trial 1 mean JOL: Immediate: .37, Delayed: .20

    • Trial 2 mean JOL: Immediate: .35, Delayed: .32

    • Trial 1 calibration (JOL - Recall): Immediate: .15, Delayed: .09

    • Trial 2 calibration (JOL - Recall): Immediate: -.06, Delayed: .01 (not significant).

Analyzing Contributions to UWP

  • Exploration of whether unrecalled items disproportionately contribute to UWP.

  • JOL classifications include:

    • Recalled on T1 and T2: RR (Recalled-Recalled)

    • Not recalled on T1 but recalled on T2: FR (Forgotten-Recalled)

  • Notably, JOLs for FR items tend to be disproportionately low.

Tullis, Finley, & Benjamin (2013): Findings

  • Cue-only Group G(Phase2Test, Phase3JOL) = .84

  • Cue-target Group G(Phase2Test, Phase3JOL) = .86

  • Evidence suggests that test performance informs JOLs:

    • Cue-only G(Phase3JOL, Final_Test) = .93

    • Cue-target G(Phase3JOL, Final_Test) = .54

  • Higher resolution of cue-target JOLs for tested than for re-studied items is observed both immediately and after delays.

Subsequent Experiments Overview (Experiments 2, 3, and 4)

  • Experiments Structure:

    • Phase 1: Study 32 word pairs.

    • Phase 2: Choose to re-study or test the word pairs.

    • After this phase, participants were asked to predict the number of items they believed they would remember the next day (global prediction).

    • Phase 3: Conduct a cued recall test the following day.

    • Phase 4: Study a new list of word pairs.

    • Phase 5: Similar re-study or test decision was repeated, followed by a global prediction (no actual test for the second list).

  • Results from Experiment 2 (No Feedback):

    • Final cued recall results:

    • Restudy: .19

    • Tested: .24

    • Notably, 19 out of 35 participants demonstrated the testing effect.

Experiment 3: Addressing Feedback

  • Long delays between the original study and the practice/re-study phases can obscure participants' memory of their earlier study methods.

  • Participants had to remember how many items were remembered or forgotten to inform their predictions.

  • At the cued recall test on day 2, participants received feedback on their accuracy, informed whether items were correctly recalled and if each had been re-studied or tested in the prior phase.

  • Results from Experiment 3 (Partial Feedback):

    • Final cued recall results:

    • Restudy: .15

    • Tested: .21

    • 35 out of 53 participants displayed the testing effect.

Experiment 4: Comprehensive Feedback Provided

  • Participants received explicit feedback about how many items they recalled correctly, immediately after testing for list 1 (before global predictions for list 2).

  • Results from Experiment 4 (Full Feedback):

    • Final cued recall results:

    • Restudy: .17

    • Tested: .27

    • 19 out of 25 participants exhibited the testing effect.

Combined Analysis of Experiments 2, 3, and 4

  • Discussion points:

    • Under optimal conditions, metacognitive judgments can be accurate.

    • Accounting for the testing effect can be complex due to the difficulties in recognizing delayed benefits.

    • Providing external support (via performance feedback) aids participants in recognizing the efficacy of their encoding processes.

    • Promoting awareness of successful encoding strategies (like generation) can result in better choices in future learning contexts.

    • For making accurate judgments, participants should:

    • Detect differences in performance based on study conditions.

    • Attribute variations to the appropriate encoding method (track performance).

    • Acknowledge and alter their beliefs about effective strategies when given feedback.