SAED-3 RS Reliability Notes
Overview
- The SAED-3 RS (Scales for Assessing Emotional Disturbance–3rd Edition Rating Scale) is a 45-item, teacher-completed rating scale designed to align with the federally defined characteristics of Emotional Disturbance (ED) to support eligibility decisions under IDEA.
- Purpose of SAED-3 RS reliability studies: provide psychometric evidence that the SAED-3 RS reliably measures emotional/behavioral functioning across diverse students and settings, supporting data-informed decisions in MTSS/RTI frameworks and educational planning.
- Context: ED prevalence estimates in the US are high among school-age children; yet ED is underrepresented in IDEA services. Early identification and intervention can improve long-term outcomes; reliable measures are central to data-based decision making in schools.
- SAED-3 RS integrates with MTSS/RTI/PBIS approaches to inform screening, progress monitoring, and evaluation of student outcomes.
What the SAED-3 RS Measures and How it Is Used
- ED criteria addressed by SAED-3 RS: five core characteristics per IDEA ED definition—
- Inability to learn (IL)
- Relationship problems (RPs)
- Inappropriate behavior (IB)
- Unhappiness or depression (UD)
- Physical symptoms or fears (PS)
- Supplemental subscale: Socially Maladjusted (SM) for ages 12–18.
- Respondent: teacher familiar with the student for at least 2 months.
- Items and scoring:
- 45 items total; 5 core subscales (IL, RP, IB, UD, PS) + 1 supplemental (SM for 12–18).
- 4-point Likert-type scale per item: 0 = ext{not a problem}, \, 1 = ext{mild problem}, \, 2 = ext{considerable problem}, \, 3 = ext{severe problem}.
- Subscale scores are summed and transformed to scaled scores; a composite rating scale index combines core subscales for a global behavioral functioning measure (not used for eligibility).
- Interpretation thresholds per subscale:
- ext{score} \le 13\Rightarrow\text{not indicative of ED}
- 14 \le \text{score} \le 16\Rightarrow\text{Indicative of ED}
- \text{score} \ge 17\Rightarrow\text{Highly indicative of ED}
- SM subscale threshold: a scaled score of 14 or greater on SM (≈ 91st percentile) indicates antisocial/delinquent behavior in the community and may signal needs beyond IDEA.
- Norming and reliability emphasis:
- Norms updated to align with U.S. census; bias/generalizability checks; convergent validity with other behavior measures; factor structure confirmation.
- Practical use: supports data-informed eligibility decisions, informs IEP goals, and helps identify MTSS/RTI targets and interventions.
Reliability Focus and Evidence Structure
- Reliability forms addressed: internal consistency, interrater reliability, and test–retest reliability.
- Key analytic approaches:
- Internal consistency: Cronbach’s alpha \alpha for subscales and for the rating-scale index composite (and Guilford’s formula for the composite using subscale intercorrelations).
- Interrater reliability: Cohen’s weighted kappa (κ), with interpretation guidelines (Altman 1991; Landis & Koch 1977).
- Test–retest reliability: Pearson-type correlation across two administrations separated by 14 days, with correction for range effects and interpretation using Hopkins (2002) scales.
- Overall finding across studies: SAED-3 RS demonstrates reliable, stable measurement of emotional/behavioral functioning across age 5–18 and across raters over short timeframes.
Study 1: Internal Consistency (Normative vs ED samples)
- Design and aim:
- Examine internal reliability (Cronbach’s α) for SAED-3 RS subscales and the RSI composite across ages 5–18 in the normative sample, and for the ED sample.
- Data collection and samples:
- Normative sample: 1,430 students aged 5–18 from 23 states; data collected fall 2015–spring 2018.
- ED sample: 441 students aged 5–18 rated in 15 states.
- Poststratification weighting used to adjust for under/overrepresentation to match U.S. census distributions (less than 6% of sample affected).
- Demographic details for normative sample (Table 1: region, gender, race, Hispanic status, family income, parent education, exceptionality status) show alignment with U.S. school-age population with some over-/under-representation in groups; ED sample reflects school-age special education population characteristics (regional distribution, gender, race, Hispanic status).
- Subscales and reliability metrics:
- Subscales: IL, RP, IB, UD, PS, SM (only for ages 12–18); RSI = rating scale index (composite).
- Averaged internal consistency across ages:
- Normative sample: subscale alphas ranged from 0.79 to 0.92; RSI composite = 0.96 (overall average across ages). Typical values: IL ≈ 0.92, RP ≈ 0.86, IB ≈ 0.91, UD ≈ 0.83, PS ≈ 0.79, SM ≈ 0.85, RSI ≈ 0.96 (Table 2).
- ED sample: subscale alphas ≥ 0.90 for all subscales; composite alpha = 0.96 (Table 3).
- Across ages (normative sample): internal consistency generally increases with age; younger ages show lower reliability especially for RP, UD, PS; authors recommend potential development of additional items for younger ages or a separate form for younger children.
- Interpretive conclusions:
- SAED-3 RS demonstrates strong internal consistency, particularly in older age groups and among ED-identified students; supports use for understanding global ED-related functioning and for informing program decisions.
- Practical takeaway:
- For practitioners using SAED-3 RS beyond ED eligibility, internal consistency supports reliability of subscale and composite interpretations, especially in upper elementary through high school.
Study 2: Interrater Reliability
- Data collection and sample:
- From SAED-3 norming participants, educators were asked to rate up to 10 students independently on the same day, with unique IDs and protocols.
- Interrater sample: 216 students rated by 123 pairs of educators in South Carolina and Texas; ages 5–18 (M = 11.96, SD = 2.99); demographics: male 72%, White 61%, Hispanic 10%, ED exceptionality ~25% (some with ED). A randomly selected subset (n=31) matched norming demographics was included in the normative sample when possible.
- Statistical analysis:
- Interrater reliability assessed with Cohen’s weighted kappa across three age blocks: 5–11 (elementary), 12–14 (middle), 15–18 (high).
- Ratings collapsed into three adoption bands per SAED-3 RS thresholds: not indicative of ED, indicative of ED, highly indicative of ED.
- Results (coherent across groups):
- Combined sample (n=216): subscale κ values ranged from .70 to .84 (good to very good) and the composite κ was .89 (very good).
- By age block:
- Ages 5–11 (n=96): IL κ = .75; RP κ = .70; IB κ = .75; UD κ = .46; PS κ = .62; Composite κ = .64.
- Ages 12–14 (n=74): IL κ = .82; RP κ = .88; IB κ = .86; UD κ = .82; PS κ = .79; Composite κ = .92.
- Ages 15–18 (n=46): IL κ = .85; RP κ = .91; IB κ = .95; UD κ = .92; PS κ = .85; Composite κ = .94.
- Interpretation and implications:
- Overall, interrater reliability was good to very good across subscales and very good for the composite index, suggesting consistent ratings between educators who know the student and have similar training.
- Practical takeaway:
- Supports use of SAED-3 RS in multi-rater contexts, especially for ages 12–18 where interrater agreement is strongest.
Study 3: Test–Retest Reliability
- Data collection and sample:
- 117 students rated by the same educators at two time points approximately 14 days apart (Time 1 and Time 2).
- Locations: CO, IA, KS, NC, TX, UT; ages 5–18 (M = 12.10, SD = 3.66); demographics varied (gender, race, ethnicity, exceptionality status).
- Statistical analysis:
- Correlations between Time 1 and Time 2 ratings calculated for each subscale and the RSI composite; corrected correlations (rc) are reported, with uncorrected (ru) also shown when useful.
- Subgroups by age: 5–11, 12–14, 15–18; combined sample analyzed as well.
- Effect sizes for practice effects assessed via Cohen’s r; results reported as trivial if small.
- Results (combined sample):
- Subscale rc values (corrected): Inability to learn 0.64; Relationship problems 0.81; Inappropriate behavior 0.71; Unhappiness/depression 0.77; Physical symptoms/fears 0.84; Composite 0.88.
- Subgroup trends by age block:
- 5–11: IL 0.64; RP 0.84; IB 0.60; UD 0.74; PS 0.81; RSI 0.88.
- 12–14: IL 0.55; RP 0.82; IB 0.85; UD 0.80; PS 0.84; RSI 0.87.
- 15–18: IL 0.58; RP 0.95; IB 0.88; UD 0.77; PS 0.89; RSI 0.93.
- Interpretation and implications:
- Test–retest reliability ranged from moderate to very large across subscales, with the composite consistently very strong (≈0.88–0.93 depending on subgroup).
- The SAED-3 RS demonstrates stability over a short-term window (14 days), supporting its use for establishing baseline behavior and monitoring short-term changes.
- Practical takeaway:
- The SAED-3 RS provides reliable short-term stability in scores, allowing educators to track changes over short intervals without large measurement error.
Overall Reliability Synthesis and Practical Implications
- Internal consistency:
- Normative: average subscale alphas around 0.86 (median) with many at or above 0.80; RSI composite \approx 0.96.
- ED sample: subscale alphas ≥ 0.90 and composite 0.96, indicating high internal coherence of the scale within ED-identified students.
- Interrater reliability:
- Across all ages: subscales \kappa from 0.70 to 0.95, composite \kappa = 0.89; overall good-to-very-good agreement between raters.
- Test–retest reliability:
- Across combined sample: subscales from 0.64 to 0.84; composite ≈ 0.88; stability over 14 days is strong, with minimal practice effects.
- Practical implications:
- SAED-3 RS demonstrates robust psychometric properties for use in both eligibility determinations and ongoing progress monitoring within MTSS/RTI frameworks.
- When used for eligibility decisions, clinicians should consider SAED-3 RS results alongside other data sources (e.g., case history, functional behavioral assessments, parent ratings) as recommended by practice guidelines.
- The SM subscale should be interpreted with caution in younger students (12–18 age range only) and not used in isolation to drive eligibility decisions for younger children.
Limitations and Cautions
- Sampling and generalizability:
- Interrater and test–retest samples were not fully representative; participation was voluntary, introducing potential selection bias.
- Demographic composition skewed towards female, White, non-Hispanic educators with substantial experience; future work should include more diverse educators (gender, race/ethnicity, years of experience).
- ED sample was geographically distributed but the representation across school levels (elementary/middle/high) could be expanded with larger samples.
- Nested data considerations:
- Data in some designs are nested (students within classrooms); analyses did not fully model nesting. Future work should account for hierarchical data structures to examine variance components more precisely.
- Validity evidence:
- This set focuses on reliability; convergent and discriminant validity with other measures (e.g., BASC-3, SDQ, STEP-S, SSIS) and criterion validity with ED diagnoses warrant further study.
- Supplemental SM subscale:
- Reliability for SM improves with age (12–18) but is not recommended as a sole basis for placement decisions and requires further investigation to establish validity across ages.
Contextual and Practical Considerations for Implementation
- Alignment with IDEA and ED definitions:
- SAED-3 RS was designed to map directly onto the five ED characteristics in IDEA (inability to learn, relationship problems, inappropriate behavior, unhappiness/depression, physical symptoms/fears) and to include SM as a supplemental indicator at older ages.
- Role in MTSS/RTI/PBIS frameworks:
- Serves as a standardized data point for screening and progress monitoring and helps justify decisions regarding supports and interventions.
- Administration logistics:
- Requires a teacher who knows the student well (≥2 months) to complete; multiple raters can be used to triangulate information.
- Usage caveats:
- Composite index should not be the sole basis for ED eligibility; interpret subscale scores in the context of a broader evaluation (case history, functional assessment, additional measures).
- Ethical and practice implications:
- Norming and bias considerations are important; ensure diverse representation in fledgling implementations and be mindful of potential differential item functioning across groups.
- Other ED-related or behavioral measures commonly used in schools include:
- BASC-3 (Reynolds & Kamphaus, 2015)
- SDQ (Goodman, 2001; 2005)
- STEP-S (Erford et al., 2012)
- SSIS (Gresham & Elliott, 2008)
- SAED-3 RS rationale for development:
- Unlike some instruments, SAED-3 RS explicitly aligns with federal ED criteria, aiming for better relevance to eligibility decisions and more robust normative data tailored to current U.S. student populations.
- Conceptual alignment with prevention frameworks:
- The instrument supports MTSS/RTI/PBIS by providing reliable indicators of emotional/behavioral functioning that can inform tiered interventions and program evaluation.
Limitations of the Current Studies and Future Research Directions
- Aims for future reliability studies:
- Replicate findings with larger, more diverse samples of students (including more students with ED across elementary, middle, and high school).
- Increase representation of male educators, educators of diverse racial/ethnic backgrounds, and varying years of experience.
- Include nested-data designs to account for students nested within classrooms and to explore teacher-level variance.
- Reliability and validity expansion:
- Additional reliability studies should quantify stability over longer time periods (longer test–retest intervals) and interrater reliability across broader samples.
- Convergent validity with other established ED measures and predictive validity for educational outcomes (IEP goals, service provision) should be explored.
- Practical refinement:
- Consider developing complementary forms or item banks for younger ages to improve age-appropriate sensitivity, as suggested by age-related differences in internal consistency.
Conclusion and Takeaways
- The SAED-3 RS demonstrates strong reliability across internal consistency, interrater reliability, and test–retest reliability, supporting its use as a reliable data source in identifying students with ED and informing MTSS/RTI-based interventions and educational planning.
- While findings are broadly favorable, ongoing research with more diverse samples and validity studies will strengthen its utility and help refine its use in practice.
- Practitioners should integrate SAED-3 RS results with other data sources and consider the Age- and Subscale-specific implications, especially when using the SM subscale for decision making.
- Scoring interpretation for core subscales: ext{score}_{subscale} \in [0, 3] \text{ per item; sum to raw subscale score; transform to scaled score}
- ED thresholds per subscale:
- Not indicative: \text{scaled score} \le 13
- Indicative of ED: 14 \le \text{scaled score} \le 16
- Highly indicative of ED: \text{scaled score} \ge 17
- SM threshold: scaled score \ge 14 indicates antisocial/delinquent community behavior (91st percentile).
- Reliability benchmarks (interpretive guidance):
- Internal consistency: \alpha values \ge 0.80 (minimally reliable); \ge 0.90 (desirable).
- Interrater reliability: Cohen’s κ values interpreted as poor to very good; values >0.60 typically viewed as good/substantial; values >0.80 as very good.
- Test–retest reliability: rc values approaching or exceeding 0.80 indicate strong stability over 2 weeks.
References to Key Study Details (for quick recall)
- Normative sample: 1,430 students; ED sample: 441 students; data collection 2015–2018; poststratification weighting used to align with U.S. census distributions.
- Internal consistency (Study 1): Normative subscale alphas 0.79\sim0.92; RSI composite \approx 0.96; ED subscales ≥ 0.90; RSI 0.96.
- Interrater reliability (Study 2): Combined κ values per subscale 0.70\sim0.89; composite \kappa = 0.89; strongest agreement in older groups (12–18).
- Test–retest reliability (Study 3): Combined rc values subscales 0.64\sim0.84; composite 0.88; minimal practice effects.
- Supplemental notes: SM reliability by age improves with age; caution advised in using SM for placement decisions before age 13.