1-s2.0-S1041608021000996-main

Meta-Analysis of Reading Interventions

Abstract

This meta-analysis investigates the impact of foundational reading skills and multicomponent reading interventions on reading comprehension for students with or at risk for reading difficulties (RDs) in kindergarten through Grade 3. The analysis includes 47 studies identified through 2019, with a total student population of 7446. The weighted average effect on norm-referenced reading comprehension is g = 0.37, which suggests a meaningful educational impact. The study found that interventions focusing solely on foundational reading skills were as effective as those combining foundational skills with comprehension instruction. A significant moderator was the measurement timepoint, where follow-up effect sizes were 0.16 smaller than immediate posttest sizes.

1. Introduction

The primary goal of reading intervention is to enhance text comprehension. While word recognition is important, the aim is for students to understand, recall, organize, summarize, and evaluate information from text. The Simple View of Reading (Gough & Tunmer, 1986; Hoover & Gough, 1990) posits that reading comprehension results from word recognition and linguistic comprehension, both relying on cognitive skills (Kim, 2017; Oakhill & Cain, 2012). Effective reading intervention should address word reading, comprehension, or both. Improvement in one area can amplify the other. Word recognition interventions can positively affect comprehension if students have basic linguistic comprehension. Interventions targeting both word recognition and comprehension may maximize reading comprehension.

The purpose of this meta-analysis is to evaluate the effects of:

Small-group and individual interventions on code-oriented foundational reading skills.
Interventions combining foundational skills and text comprehension for primary grade students with or at risk of RDs.

Reading Interventions

Reading interventions have shown positive effects on reading comprehension for students with RDs in kindergarten through Grade 3. Suggate (2010) reported mean Cohen's d effect sizes for reading comprehension:

Pre-K and K: 0.16
Grade 1: 0.42
Grade 2: 0.52
Grades 3 and 4: 0.46

Wanzek et al. (2016) reported a mean effect size estimate (Hedge's g) of 0.38 on standardized language and reading comprehension outcomes for interventions in Grades K-3 for 15 to 99 sessions.

The current meta-analysis extends this by examining the effects of small-group and individual interventions on norm-referenced reading comprehension measures for students with RDs in Grades K-3 through August 2019. This includes studies with any duration or number of sessions, as long as they used norm-referenced tests for reading comprehension outcomes.

1.1. Impact of Instructional Emphasis

This meta-analysis aims to determine if the effects of reading interventions on comprehension differ based on instructional emphasis. Interventions often include code-oriented reading skills instruction, such as:

Phonological awareness
Alphabet knowledge
Graphophonemic knowledge
Recognition of orthographic patterns
Decoding
Spelling
Word recognition

These interventions apply word reading during connected text reading to promote accurate and fluent reading. Some interventions do not include comprehension instruction, while others combine foundational skills with text comprehension (multicomponent interventions). This study examines the added value of text comprehension instruction beyond foundational skills instruction alone.

While foundational skills instruction (Foorman et al., 2016) and comprehension strategy instruction (Shanahan et al., 2010) are supported in primary grades, there's limited research on their relative effects on comprehension for students with RDs. Wanzek et al. (2016) found that the mean effect size for foundational skills interventions (Mg = 0.44) was slightly larger than multicomponent interventions (Mg = 0.35), though not statistically significant. Berninger et al. (2003) and Vadasy et al. (2002) found no significant group differences on comprehension tests when comparing explicit word recognition intervention, explicit reading comprehension intervention, or a combination of both.

The question of differential effects depending on intervention emphasis is important due to limited time for supplemental reading intervention. Maximizing intervention time for foundational skills may be crucial, especially for students struggling in early grades. Alternatively, early intervention that includes text comprehension skills development may be more beneficial.

1.2. The Impact of Intervention Dosage

This study explored the relationship between intervention dosage (number and length of sessions) and reading outcomes for students with RDs. It is frequently advised to increase the intensity of reading interventions to improve outcomes, specifically through decreasing group sizes and increasing intervention dosage (Denton, 2012; Gersten et al., 2008). However, the impact of dosage on reading outcomes for primary-grade students with RDs has mixed results. Some studies have found positive impacts of increased dosage (e.g., Al Otaiba et al., 2005), while others have found no benefit (e.g., Denton et al., 2011; Hatcher et al., 2006). Vaughn et al. (2003) found that readers with more severe impairments required more intervention than those with higher preintervention scores. Suggate (2010) and Wanzek et al. (2016) reported nonsignificant relations between instruction dosage and reading outcomes for primary-grade students with RDs.

1.3. The Impact of Research Methodology Variables

The study further investigates how research methodology affects intervention outcomes on reading comprehension. It assesses:

Study quality
Type of reading comprehension measure (cloze format, passage reading with comprehension questions, sentence comprehension)
Measurement timepoint (immediate post-intervention vs. follow-up)

Study quality was based on:

Use of a norm-referenced outcome measure
Research design (participant randomization)
Sample size
Reporting fidelity of implementation (Slavin & Madden, 2011; Slavin & Smith, 2009; What Works Clearinghouse [WWC], 2020)

These factors can impact treatment effects, with larger sample sizes and higher quality ratings generally associated with lower effect sizes (Cheung & Slavin, 2012; Hall et al., 2017; Scammacca et al., 2015; Slavin & Smith, 2009).

Student outcomes may also vary depending on how comprehension is measured. Some measures require students to understand only a few sentences, while others require reading paragraphs and answering questions. There is evidence that different reading comprehension measures used in intervention studies vary in the extent to which they explain variance in students' decoding skills versus linguistic comprehension (Cutting & Scarborough, 2009; Keenan et al., 2008).

Intervention effects are expected to be strongest on measures administered immediately after the intervention, with effects potentially diluted at follow-up (Suggate, 2010).

1.4. Research Questions

This meta-analysis addresses the following key questions:

What is the main effect on reading comprehension of small-group and individual reading interventions that provide instruction in code-oriented foundational reading skills, with or without instruction in text comprehension, for students in Grades K-3 with RDs?
Are effects on reading comprehension moderated by the focus of instruction? Specifically, do effects differ for foundational skills-only interventions relative to multicomponent interventions that provide foundational skills instruction combined with instruction in text comprehension?
Are effects on reading comprehension moderated by the dosage of intervention provided or methodological factors (i.e., study quality, whether comprehension was measured at the conclusion of the intervention or at follow-up, how comprehension was measured in the study)?

2. Method

This meta-analysis included studies analyzed by Wanzek et al. (2016) and Wanzek et al. (2018) that met our present criteria for inclusion, as well as studies published after the end date of their searches and before September of 2019. To identify relevant articles published from 1995 through 2013, the first author reviewed the study information reported by Wanzek et al. (2016, 2018) to select studies that reported outcomes on standardized tests of reading comprehension. From those prior meta-analyses, 47 articles were included in the full-text review to ensure that they met our additional inclusion criteria. Fig. 1 outlines the search process.

2.1. Identification of Recent Studies

Studies published since the end dates of the prior (Wanzek et al., 2016, 2018) meta-analyses were identified through a comprehensive electronic search of ERIC and PsycINFO. We replicated the Wanzek et al. (2016) electronic search using their search terms (reading difficult, at-risk, dsylex) for key population identifiers, cross-referenced with (reading, interven, phon, fluency, vocab, comprehen) for reading context. We then replicated the electronic search from Wanzek et al. (2018), which used an updated set of terms, and eliminated duplicates between the two resulting sets of abstracts. This second tier of the electronic search identified studies published through August of 2019 using the terms (reading interven, reading instruction, reading strategies, supplemental instruction, special educ, phon, fluency, vocab, comp) cross-referenced with (reading difficult, learning disab, reading disab, reading delays, reading disorder, dyslex). The combined search identified 8011 abstracts for screening after removing duplicates.

Abstracts and papers were screened by the first, second, and fourth authors. These three readers used the Abstrackr online platform (Wallace et al., 2012) to screen abstracts for inclusion criteria. All three readers first screened a single set of 400 abstracts for reliability purposes, establishing 98% agreement. The remaining abstracts were assigned to one of the three readers for screening. During abstract screening, 7878 records were disqualified based on the inclusion criteria. One other record was found to be duplicated between our electronic search and Wanzek et al. (2018), and the duplicate was eliminated.

Each of the 179 remaining records was assigned to two readers for full-text review of inclusion criteria. Any disagreements were discussed to reach a consensus. After full-text review, 47 studies were included in the meta-analyses.

2.2. Inclusion Criteria

In all cases, our inclusion criteria were the same as or narrower than those used by Wanzek et al. (2016) and Wanzek et al. (2018). Following are the criteria applied in the current meta-analysis.

Published in a peer-reviewed journal in English.
At least 50% of participants (in total or in disaggregated data) were identified with a learning disability, reading difficulty, or being at-risk of RDs (i.e., students with low achievement, low phonemic awareness, economic disadvantage, language disorders). We excluded studies in which more than 50% of participants had autism, intellectual disabilities, severe hearing impairment/deafness, or severe vision impairment/blindness.
At least 50% of participants (in total or in disaggregated data) were enrolled in US (or a foreign country's equivalent of) Grades K-3 (ages 5–9).
Interventions provided instruction and/or practice in foundational reading skills, with or without instruction targeting text comprehension.
Interventions were provided in English and as part of school programming (i.e., not home, clinic, or camp programs). Interventions provided in summer school as part of regular school programming were included.
Interventions were not part of the general education curriculum provided to all students as a part of classroom reading instruction (i.e., Tier 1 interventions).
The research design was experimental or quasi-experimental with a comparison or control group, and sufficient data were provided to calculate effect sizes.
At least one of the dependent variables addressed an outcome in reading comprehension using a standardized, norm-referenced measure (i.e., excluding researcher-developed measures and criterion-referenced tests such as many state-mandated assessments). We excluded studies that only provided data for composite reading outcomes and did not report reading comprehension outcomes separately.
The comparison group received typical school-provided reading instruction (i.e., business-as-usual; BAU) or instruction in a non-literacy subject. We included comparisons with reading activities that did not include structured reading instruction (e.g., read-aloud time). We also included other comparison conditions that were researcher-developed but did not teach reading (e.g., math, study skills). We excluded studies that contrasted two or more researcher-controlled reading interventions (e.g., studies manipulating characteristics of the same reading intervention) but did not contrast them with a BAU comparison or a non-literacy instructional condition.

2.3. Coding Procedures

Details of each study were coded on a spreadsheet adapted from Wanzek et al. (2016). Coded data described study participants, research design, study characteristics, intervention conditions, comparison conditions, measures, and results. We added to the Wanzek et al. (2016) coding sheet to include more specific information about the nature of the experimental interventions. To establish coding reliability, the three readers independently coded the same two studies identified in the updated electronic search. Reliability with the first author was evaluated; the mean agreement was 98.97%.

The original coding sheets were available for the studies from the Wanzek et al. (2016) meta-analysis. Codes from Wanzek et al. (2016) were verified by the fourth author, who referred to the full texts of the articles. Differences in coding between this coder and Wanzek et al. (2016) were resolved by the first author. The full texts were also further coded for additional details regarding the intervention content (e.g., systematic instruction). All new codes were reviewed and verified by the first author, with disagreements resolved through discussion. All studies identified from Wanzek et al. (2018) and through the updated electronic search of studies from 2014 through 2019 were independently coded by the first and second authors, who resolved discrepancies through discussion.

Studies were coded as providing foundational skills instruction if they included instruction or practice activities targeting phonological awareness, alphabetic knowledge, graphophonemic knowledge, recognition of orthographic patterns, decoding, spelling, and/or word recognition. Some approaches to instruction were systematic, in that they followed a pre-determined sequence of instructional objectives. Some approaches adhered to explicit instructional models, in which the teacher models and directly teaches skills and strategies and provides purposeful practice with feedback. Other approaches to foundational skills instruction did not follow a pre-determined sequence; lessons were planned by the teacher in response to assessment results or observations of the child's reading behaviors. Some intervention approaches were inductive, rather than explicit: students were guided to infer content, such as letter-sound relationships. In some interventions, students were taught to utilize a variety of word-identification strategies, including strategies that encouraged students to attend to meaning cues (i.e., to make inferences about word identity based on the context of the text and/or illustrations).

In this meta-analysis, interventions were categorized as multicomponent if they included text comprehension instruction in addition to foundational skills instruction. Some interventions included sequenced instruction in specific comprehension strategies and/or skills (e.g., finding the main idea of a paragraph, generating inferences, summarizing text). Other interventions did not address comprehension skills and strategies in a predetermined order. These included text-based questioning, discussion of text during and after reading, and pre-reading lessons in which students made predictions based on illustrations and prior knowledge, and similar approaches.

Besides coding foundational reading skills and comprehension instruction, we also coded for the presence of text reading, vocabulary instruction, and non-literacy instruction in the interventions. Text reading instruction included reading fluency practice, as well as other activities in which students read connected text. We coded for vocabulary instruction whenever instruction was described as addressing word meanings, even when word meanings appeared to have been discussed only briefly or incidentally. In some studies, interventionists briefly defined words in service of word reading instruction or practice. Thus, the presence of vocabulary instruction in an intervention did not necessarily indicate that text comprehension was a meaningful focus. Non-literacy instruction included components addressing self-regulation and study skills.

For the purpose of conducting moderator analyses, study dosage was treated as a continuous variable. When a study specified the mean hours of intervention actually received by students in the experimental condition, that number was adopted as the dosage of the study. When the actual mean hours received was not provided in a study, dosage was calculated by multiplying the number of intervention sessions provided and the length of those sessions. When the number of sessions was not provided, we followed procedures used by Wanzek et al. (2016), estimating the dosage from the information provided in the study.

Based on WWC standards (WWC, 2020) and study quality recommendations articulated by Slavin and colleagues (Slavin & Madden, 2011; Slavin & Smith, 2009), study quality was assessed according to adherence to four explicit criteria. Each study received a rating of 1, 2, 3, or 4 points, with studies receiving a point for (a) having a norm-referenced measure (all studies met this criterion), (b) having a randomized research design, (c) having a sample size of 250 or greater, and (c) measuring and reporting fidelity of implementation.

Other intervention and study characteristics coded for this meta-analysis were (a) participant grade levels, (b) estimate of socioeconomic status (i.e., whether at least 50% of participants were identified as economically disadvantaged or participants attended schools where 50% or more of the population was economically disadvantaged), c) intervention implementer, and (d) the instructional group size. These were not analyzed as moderator variables; however, the results of our coding were summarized descriptively. It was our intent to analyze grade level as a potential moderator of student outcomes since prior meta-analyses have found that reading intervention outcomes vary by grade level (Suggate, 2010; Wanzek & Vaughn, 2007); however, we were unable to form mutually exclusive categories since many studies provided intervention across multiple grade levels.

2.4. Effect Size Calculation and Meta-Analytic Procedures

To quantify the effects of early reading intervention on comprehension outcomes, we used standardized mean differences between intervention and control groups estimated with Hedges' g. We calculated g from reported mean and standard deviation (SD) estimates by group if available; otherwise, we used the reported statistical tests (e.g., F statistics, standard error). In one study (Fien et al., 2015), we used the study-reported Hedges' g since no other information was available.

2.4.1. Outlier Analysis

Prior to synthesizing effect sizes across studies, we examined the distribution of raw effect size estimates to identify outliers. We defined an outlier as a value below the first quartile minus 3 times the interquartile range (− 1.06) or above the third quartile plus 3 times the interquartile range (1.74; Tukey, 1977). We identified two outlying effect sizes, one at the low end of the distribution (Chapman et al., 2001; g = -1.42) and another at the high end (Mathes et al., 2005; g = 1.99). These effect sizes were winsorized to the corresponding fence values. To evaluate the impact of outliers, we conducted sensitivity analyses by running the models using raw data (including outliers) and results were nearly identical between the two. Therefore, we present the findings from the analyses of raw data.

2.4.2. Estimation of Mean Effects

There are several sources of dependencies in our effect size estimates. Many included studies reported intervention effects on multiple measures of comprehension outcomes and/or at multiple timepoints. Additionally, several studies included multiple comparisons as a result of having more than one intervention or control condition. Effect size estimates from a single study are likely to be correlated because they are from the same or a shared sample. Therefore, we used three-level, multivariate random effects analyses, assuming a correlation of 0.70.

In addition, we used the robust variance estimation (RVE) method to apply small-sample corrections to standard errors, hypothesis tests, and confidence intervals (Hedges et al., 2010; Tipton, 2015; Tipton & Pustejovsky, 2015). In these analyses, we used study as a clustering unit. This analytic approach estimates the average effect size using all information available and then adjusts the standard errors to account for the inherent clustering of these related effect sizes. The model decomposes the variance of the effect size into three parts. The Level 1 variance is the sampling variance estimated via the traditional variance calculation. The Level 2 variance is the within-study variance. The Level 3 variance is the variance between the studies.

To measure the heterogeneity between studies and within a study, we report the restricted maximum likelihood estimate of the between-study variance (\tau^2) and within-study variance (\omega^2). We also report the Q statistics and I^2 statistic partitioned to each level, which is a relative measure of the extent to which heterogeneity among true effect sizes contributes to the observed variation in the effect size estimates. We adopt the guideline articulated by Higgins et al. (2003), which holds that I^2 of 50% to 75% indicates a moderate amount of heterogeneity, enough to conduct moderator analyses. We also provide a 68% prediction interval, a possible range within which about two-thirds of the effects should fall.

2.4.3. Moderator Analyses

First, we examined each moderator separately using meta-regression. The effects of moderators were tested using small-sample adjusted t-test for the moderators with 2 categories. For the moderators with more than 2 categories, we used the Wald test function from the clubSandwich package (Pustejovsky, 2016) that applies small-sample adjusted F tests for the moderators with more than 2 categories. Second, we conducted a multiple meta-regression to model the effects of the moderators simultaneously. This approach allowed us to look at the impact of each moderator while controlling for other potential moderators; it provided us with information as to how moderator effects should be interpreted in the context of other moderators.

We also examined the possibility of publication bias with a modified version of Egger's regression to account for dependent effect size estimates (Egger et al., 1997; Rodgers & Pustejovsky, 2020). The Egger's regression test examines the asymmetry in the effect sizes as a function of precision (i.e., by examining standard errors) with an assumption that small sample studies with large effect sizes are more likely to be published than small-sample studies with small effect sizes. Such asymmetry based on precision may indicate publication bias. Following the main effects model, we used a multilevel modeling approach and RVE correction for Egger's regression, and the standard error of each effect size estimate was used as a predictor. In addition, we performed p-curve analyses (Simonsohn et al., 2014) to detect whether there was extensive data mining (referred to as p hacking) in search of statistically significant results. This is detected by the shape of a p curve, which is the distribution of statistically significant p values across a set of studies. The right-skewed p curve, where a majority of p values reported are much smaller than 0.05, is expected when there is a true underlying effect. If the p curve is flat, due to a similar number of p values across the range of p values between 0 and 0.05, no true effect exists. If the p curve is left-skewed, this is an evidence of p hacking. We used the results of p-curve analysis as a supplement to Egger's test because p-curve analysis do not account for dependencies in the data.

All analyses were conducted with the metafor package (Viechtbauer, 2010) and the clubSandwich package (Pustejovsky, 2016) for the R statistical computing environment (R Core Team, 2019). We used p-curve function from the dmetar package (Harrer et al., 2019) for the p-curve analysis.

3. Results

3.1. Study Features

Table 1 provides the features of the 47 studies that met criteria for inclusion. The total sample size for all studies in the meta-analysis was 7446 students; sample sizes for individual studies ranged from 24 to 881. Eight studies included students in kindergarten, 34 included students in first grade, 16 included second grade, and 9 included Grade 3. Fourteen of these studies included participants in two or more grades. According to our best estimate based on the information provided by the researchers, in 29 of the studies, 50% or more of the participants were economically disadvantaged or attended schools in which 50% or more of the students were economically disadvantaged; in 8 studies, no information about socioeconomic status was provided. There was only one study in which more than 50% of students were served by special education (Mathes & Babyak, 2001); in all other studies, 50% or more of the students were at risk for or experiencing reading difficulties but receiving reading intervention outside of special education, or the special education status of participants was not provided.

The 47 studies provided effect sizes associated with 68 intervention treatment conditions. By design, all intervention conditions included foundational skills instruction. Thirty-nine also included instruction in comprehension. Sixty-one included text reading instruction or practice, which may have included practice designed to increase oral reading fluency. Twenty included vocabulary instruction of some kind. The six studies that included a vocabulary instruction component without also including a comprehension component only taught word meanings incidentally, in the service of word study or text reading instruction. Five interventions included non-literacy instructional components.

All of the qualifying studies had designs that compared treatment groups to comparison or control groups; 29 had experimental designs with randomized assignment to research conditions, while 18 were quasi-experimental. In 30 of the studies, researchers described the collection of data on fidelity of implementation and reported the results in the paper. To measure comprehension, 18 studies used measures that required the student to read multiple sentences or paragraphs and answer questions about them (e.g., GORT-3 RC; GMRT RC); 4 studies used sentence comprehension measures (e.g., TOSREC, WRAT-4 SC); and 33 studies utilized measures that had a cloze format in which students supplied words omitted from phrases, sentences, or brief passages (e.g. WRMT-R PC; WJ III PC).

Table 2 describes intervention features and effect sizes calculated for each contrast. Across studies, intervention was provided for an average of 57.19 h (SD = 35.19; range = 7 to 149). In two studies, insufficient information was provided to calculate intervention dosage (Ehri et al., 2007; Miller, 2003). It should be noted that the effect sizes reported in Table 2 often differ from those reported by the studies' authors. Researchers frequently report effect sizes based on covariate-adjusted posttest scores. We calculated effect sizes based on covariate-adjusted scores when they were provided; however, most studies only reported unadjusted posttest scores.

3.2. Meta-Analytic Findings

3.2.1. Main Effects

The meta-analysis included 47 studies encompassing 112 comparisons (1–12 per study) and a combined sample of 7446 students. The weighted average effect on comprehension outcomes was estimated as g = 0.37 (95% CI [0.27, 0.47], p < .01). As expected, there was substantial heterogeneity across effect sizes (Q = 416.57, df = 111, p < .01) with between-study variance of τ^2 = 0.03 and within-study variance of ω^2 = 0.04 with a total I^2 of = 66.64% (Level 2 = 37.73% and Level 3 = 28.91). The 68% prediction interval was 0.09 to 0.65.

3.3. Moderator Analyses

We report estimates of average effect size disaggregated by the levels of each moderator in Table 3. We note that there were few contrasts in certain categories within moderators, which influences the degrees of freedom; analyses of these moderators are statistically underpowered, and a failure to demonstrate statistically significant differences in these cases may be the result of Type 2 error.

3.3.1. Impact of Instructional Emphasis

To determine the effects of instructional emphasis, we examined moderation with two categories: foundational reading skills instruction only and multicomponent instruction (i.e., both foundational reading skills and comprehension). The mean effect of multicomponent interventions on reading comprehension did not differ from that of foundational reading skills-only interventions (F = 0.94, df = 14.60, p = .35) and instructional emphasis did not explain any of the variance in effect sizes.

3.3.2. Impact of Dosage

Dosage was defined as the number of instructional hours and modeled as a continuous variable. The effect of intervention with an average dosage (57.19 h) was 0.38, and dosage predicted effect size positively (β = 0.003), suggesting that, with each additional hour of instructional time, the effect size increases by 0.003. Dosage explained 7.96% of the variance in effect sizes; however, the effect of dosage was not statistically significant (F = 4.20, df = 13.80, p = .06).

3.3.3. Study Quality

Study quality explained 6.21% of the variance in effect sizes but was not a significant moderator as a whole in an omnibus test (F = 1.12, df = 3.68, p = .45). Descriptively speaking, higher-quality studies tended to have smaller effects than the lowest-quality studies (M_gs = 0.63, 0.39, 0.29, and 0.22 for studies rated “1” [lowest quality], “2,” “3,” and “4” [highest quality], respectively); however, only two studies earned the highest “4” rating, which at least partially explains why the standard error was large and the p-value was nonsignificant when comparing lowest-quality studies and highest-quality studies.

3.3.4. Type of Comprehension Measure

Although comprehension measure type was not a significant moderator as a whole (F = 2.91, df = 4.50, p = .16), it explained 12.91% of the variance in effect sizes. Cloze reading comprehension assessments tended to be associated with larger effect sizes (Mg = 0.43) than passage reading or sentence comprehension assessments (Mgs = 0.27 and 0.22, respectively).

3.3.5. Measurement Timepoint

Measurement timepoint (posttest, follow-up) was not a significant moderator of intervention effects (F = − 1.38, df = 7.23, p = .21) and it explained 1.95% of the variance in effect sizes, although effect sizes tended to be somewhat smaller at follow-up (M_gs = 0.39 vs. 0.29).

3.3.6. Multiple Meta-Regression Analysis

To better understand each moderator's effect in the presence of other moderators, we conducted a multiple meta-regression analysis. Results from the omnibus Wald test indicated that moderators included in the multiple meta-regression were not significant as a whole (F = 2.52, df = 6.08, p = .14). The covariates included in the multiple moderator analysis explained 36.96% of the variance in effect sizes (τ^2 = 0.005; ω^2 = 0.047). As demonstrated in Table 4, results of the Wald test of each moderator, controlling for other moderators in the model, indicated that only measurement timepoint was a significant moderator (F = 7.35, df = 7.48, p = .03), with the mean effect size at follow-up being smaller by 0.17 (p = .03) than the mean effect size at immediate posttest. In addition, while measurement type was not a significant moderator as a whole (F = 3.28, df = 4.45, p = .13), the mean effect size on cloze format measures was larger by 0.20 (p = .04) than the mean effect size on passage reading measures. Similarly, study quality was not a significant moderator as a whole (F = 2.11, df = 5.23, p = .21), but studies with the highest quality rating (note that there were only two studies with this rating) had a mean effect size that was smaller by 0.48 than studies with the lowest quality rating; studies with the second-highest quality rating (n = 19) had a mean effect size that was smaller by 0.43 than studies with the lowest quality rating.

3.3.7. Publication Bias

The modified Egger's regression test result indicated that standard errors of effect size estimates did not predict effect size estimates across the studies (\hat{β} = 1.46, t(19.8) = 1.88, p = .08). Findings from the p-curve analysis confirm the results of the Egger's test. Right-skewness test of the p curve was significant (binomial = 0.00, p < .001) while the flatness test was not significant (binomial = 0.96, p > .009), suggesting that evidential value is present and there is a true effect size behind our findings such that they are not the result of p hacking.

4. Discussion

This meta-analysis investigated the effects of small-group or individual reading interventions on the reading comprehension of primary-grade students with RDs. A total of 47 studies met criteria for inclusion. An overall mean weighted effect size of 0.37 (m = 112) on norm-referenced reading comprehension outcomes indicates that small-group or individually-administered reading interventions have an educationally meaningful effect on reading comprehension for primary-grade students with RDs. Wanzek et al. (2016) reported a mean effect size of 0.38 on standardized measures of language and comprehension (including vocabulary measures) for reading interventions provided for less than 100 h. Suggate (2010) reported similar mean effects on standardized and unstandardized comprehension outcomes for small- and large-group reading interventions in preschool through Grade 4 (M_ds ranging from 0.16 to 0.52). These converging findings increase the confidence with which we can state that reading interventions can be expected to result in reading comprehension gains for students with RDs in Grades K-3.

Because this meta-analysis revealed significant variance in reading comprehension outcomes, we also explored the effects of potential moderators (see Tables 3 and 4). First, we examined the degree to which instructional emphasis (i.e., emphasis on code-oriented foundational reading skills or on a combination of foundational reading skills and comprehension) moderated intervention effects for primary-grade students with RDs. We found there were no statistically significant differences in the magnitude of effects for foundational skills-only and multicomponent interventions. The descriptive effect size information associated with the instructional emphasis moderator variable suggested that foundational skills instruction and multicomponent instruction had similarly positive impacts on reading comprehension outcomes (Mg = 0.40 and $$Mg