SAED-3 RS Reliability Notes

Overview

  • The SAED-3 RS (Scales for Assessing Emotional Disturbance–3rd Edition Rating Scale) is a 45-item, teacher-completed rating scale designed to align with the federally defined characteristics of Emotional Disturbance (ED) to support eligibility decisions under IDEA.
  • Purpose of SAED-3 RS reliability studies: provide psychometric evidence that the SAED-3 RS reliably measures emotional/behavioral functioning across diverse students and settings, supporting data-informed decisions in MTSS/RTI frameworks and educational planning.
  • Context: ED prevalence estimates in the US are high among school-age children; yet ED is underrepresented in IDEA services. Early identification and intervention can improve long-term outcomes; reliable measures are central to data-based decision making in schools.
  • SAED-3 RS integrates with MTSS/RTI/PBIS approaches to inform screening, progress monitoring, and evaluation of student outcomes.

What the SAED-3 RS Measures and How it Is Used

  • ED criteria addressed by SAED-3 RS: five core characteristics per IDEA ED definition—
    • Inability to learn (IL)
    • Relationship problems (RPs)
    • Inappropriate behavior (IB)
    • Unhappiness or depression (UD)
    • Physical symptoms or fears (PS)
  • Supplemental subscale: Socially Maladjusted (SM) for ages 12–18.
  • Respondent: teacher familiar with the student for at least 2 months.
  • Items and scoring:
    • 45 items total; 5 core subscales (IL, RP, IB, UD, PS) + 1 supplemental (SM for 12–18).
    • 4-point Likert-type scale per item: 0 = ext{not a problem}, \, 1 = ext{mild problem}, \, 2 = ext{considerable problem}, \, 3 = ext{severe problem}.
    • Subscale scores are summed and transformed to scaled scores; a composite rating scale index combines core subscales for a global behavioral functioning measure (not used for eligibility).
    • Interpretation thresholds per subscale:
    • ext{score} \le 13\Rightarrow\text{not indicative of ED}
    • 14 \le \text{score} \le 16\Rightarrow\text{Indicative of ED}
    • \text{score} \ge 17\Rightarrow\text{Highly indicative of ED}
    • SM subscale threshold: a scaled score of 14 or greater on SM (≈ 91st percentile) indicates antisocial/delinquent behavior in the community and may signal needs beyond IDEA.
  • Norming and reliability emphasis:
    • Norms updated to align with U.S. census; bias/generalizability checks; convergent validity with other behavior measures; factor structure confirmation.
  • Practical use: supports data-informed eligibility decisions, informs IEP goals, and helps identify MTSS/RTI targets and interventions.

Reliability Focus and Evidence Structure

  • Reliability forms addressed: internal consistency, interrater reliability, and test–retest reliability.
  • Key analytic approaches:
    • Internal consistency: Cronbach’s alpha \alpha for subscales and for the rating-scale index composite (and Guilford’s formula for the composite using subscale intercorrelations).
    • Interrater reliability: Cohen’s weighted kappa (κ), with interpretation guidelines (Altman 1991; Landis & Koch 1977).
    • Test–retest reliability: Pearson-type correlation across two administrations separated by 14 days, with correction for range effects and interpretation using Hopkins (2002) scales.
  • Overall finding across studies: SAED-3 RS demonstrates reliable, stable measurement of emotional/behavioral functioning across age 5–18 and across raters over short timeframes.

Study 1: Internal Consistency (Normative vs ED samples)

  • Design and aim:
    • Examine internal reliability (Cronbach’s α) for SAED-3 RS subscales and the RSI composite across ages 5–18 in the normative sample, and for the ED sample.
  • Data collection and samples:
    • Normative sample: 1,430 students aged 5–18 from 23 states; data collected fall 2015–spring 2018.
    • ED sample: 441 students aged 5–18 rated in 15 states.
    • Poststratification weighting used to adjust for under/overrepresentation to match U.S. census distributions (less than 6% of sample affected).
    • Demographic details for normative sample (Table 1: region, gender, race, Hispanic status, family income, parent education, exceptionality status) show alignment with U.S. school-age population with some over-/under-representation in groups; ED sample reflects school-age special education population characteristics (regional distribution, gender, race, Hispanic status).
  • Subscales and reliability metrics:
    • Subscales: IL, RP, IB, UD, PS, SM (only for ages 12–18); RSI = rating scale index (composite).
    • Averaged internal consistency across ages:
    • Normative sample: subscale alphas ranged from 0.79 to 0.92; RSI composite = 0.96 (overall average across ages). Typical values: IL ≈ 0.92, RP ≈ 0.86, IB ≈ 0.91, UD ≈ 0.83, PS ≈ 0.79, SM ≈ 0.85, RSI ≈ 0.96 (Table 2).
    • ED sample: subscale alphas ≥ 0.90 for all subscales; composite alpha = 0.96 (Table 3).
    • Across ages (normative sample): internal consistency generally increases with age; younger ages show lower reliability especially for RP, UD, PS; authors recommend potential development of additional items for younger ages or a separate form for younger children.
  • Interpretive conclusions:
    • SAED-3 RS demonstrates strong internal consistency, particularly in older age groups and among ED-identified students; supports use for understanding global ED-related functioning and for informing program decisions.
  • Practical takeaway:
    • For practitioners using SAED-3 RS beyond ED eligibility, internal consistency supports reliability of subscale and composite interpretations, especially in upper elementary through high school.

Study 2: Interrater Reliability

  • Data collection and sample:
    • From SAED-3 norming participants, educators were asked to rate up to 10 students independently on the same day, with unique IDs and protocols.
    • Interrater sample: 216 students rated by 123 pairs of educators in South Carolina and Texas; ages 5–18 (M = 11.96, SD = 2.99); demographics: male 72%, White 61%, Hispanic 10%, ED exceptionality ~25% (some with ED). A randomly selected subset (n=31) matched norming demographics was included in the normative sample when possible.
  • Statistical analysis:
    • Interrater reliability assessed with Cohen’s weighted kappa across three age blocks: 5–11 (elementary), 12–14 (middle), 15–18 (high).
    • Ratings collapsed into three adoption bands per SAED-3 RS thresholds: not indicative of ED, indicative of ED, highly indicative of ED.
  • Results (coherent across groups):
    • Combined sample (n=216): subscale κ values ranged from .70 to .84 (good to very good) and the composite κ was .89 (very good).
    • By age block:
    • Ages 5–11 (n=96): IL κ = .75; RP κ = .70; IB κ = .75; UD κ = .46; PS κ = .62; Composite κ = .64.
    • Ages 12–14 (n=74): IL κ = .82; RP κ = .88; IB κ = .86; UD κ = .82; PS κ = .79; Composite κ = .92.
    • Ages 15–18 (n=46): IL κ = .85; RP κ = .91; IB κ = .95; UD κ = .92; PS κ = .85; Composite κ = .94.
  • Interpretation and implications:
    • Overall, interrater reliability was good to very good across subscales and very good for the composite index, suggesting consistent ratings between educators who know the student and have similar training.
  • Practical takeaway:
    • Supports use of SAED-3 RS in multi-rater contexts, especially for ages 12–18 where interrater agreement is strongest.

Study 3: Test–Retest Reliability

  • Data collection and sample:
    • 117 students rated by the same educators at two time points approximately 14 days apart (Time 1 and Time 2).
    • Locations: CO, IA, KS, NC, TX, UT; ages 5–18 (M = 12.10, SD = 3.66); demographics varied (gender, race, ethnicity, exceptionality status).
  • Statistical analysis:
    • Correlations between Time 1 and Time 2 ratings calculated for each subscale and the RSI composite; corrected correlations (rc) are reported, with uncorrected (ru) also shown when useful.
    • Subgroups by age: 5–11, 12–14, 15–18; combined sample analyzed as well.
    • Effect sizes for practice effects assessed via Cohen’s r; results reported as trivial if small.
  • Results (combined sample):
    • Subscale rc values (corrected): Inability to learn 0.64; Relationship problems 0.81; Inappropriate behavior 0.71; Unhappiness/depression 0.77; Physical symptoms/fears 0.84; Composite 0.88.
    • Subgroup trends by age block:
    • 5–11: IL 0.64; RP 0.84; IB 0.60; UD 0.74; PS 0.81; RSI 0.88.
    • 12–14: IL 0.55; RP 0.82; IB 0.85; UD 0.80; PS 0.84; RSI 0.87.
    • 15–18: IL 0.58; RP 0.95; IB 0.88; UD 0.77; PS 0.89; RSI 0.93.
  • Interpretation and implications:
    • Test–retest reliability ranged from moderate to very large across subscales, with the composite consistently very strong (≈0.88–0.93 depending on subgroup).
    • The SAED-3 RS demonstrates stability over a short-term window (14 days), supporting its use for establishing baseline behavior and monitoring short-term changes.
  • Practical takeaway:
    • The SAED-3 RS provides reliable short-term stability in scores, allowing educators to track changes over short intervals without large measurement error.

Overall Reliability Synthesis and Practical Implications

  • Internal consistency:
    • Normative: average subscale alphas around 0.86 (median) with many at or above 0.80; RSI composite \approx 0.96.
    • ED sample: subscale alphas ≥ 0.90 and composite 0.96, indicating high internal coherence of the scale within ED-identified students.
  • Interrater reliability:
    • Across all ages: subscales \kappa from 0.70 to 0.95, composite \kappa = 0.89; overall good-to-very-good agreement between raters.
  • Test–retest reliability:
    • Across combined sample: subscales from 0.64 to 0.84; composite ≈ 0.88; stability over 14 days is strong, with minimal practice effects.
  • Practical implications:
    • SAED-3 RS demonstrates robust psychometric properties for use in both eligibility determinations and ongoing progress monitoring within MTSS/RTI frameworks.
    • When used for eligibility decisions, clinicians should consider SAED-3 RS results alongside other data sources (e.g., case history, functional behavioral assessments, parent ratings) as recommended by practice guidelines.
    • The SM subscale should be interpreted with caution in younger students (12–18 age range only) and not used in isolation to drive eligibility decisions for younger children.

Limitations and Cautions

  • Sampling and generalizability:
    • Interrater and test–retest samples were not fully representative; participation was voluntary, introducing potential selection bias.
    • Demographic composition skewed towards female, White, non-Hispanic educators with substantial experience; future work should include more diverse educators (gender, race/ethnicity, years of experience).
    • ED sample was geographically distributed but the representation across school levels (elementary/middle/high) could be expanded with larger samples.
  • Nested data considerations:
    • Data in some designs are nested (students within classrooms); analyses did not fully model nesting. Future work should account for hierarchical data structures to examine variance components more precisely.
  • Validity evidence:
    • This set focuses on reliability; convergent and discriminant validity with other measures (e.g., BASC-3, SDQ, STEP-S, SSIS) and criterion validity with ED diagnoses warrant further study.
  • Supplemental SM subscale:
    • Reliability for SM improves with age (12–18) but is not recommended as a sole basis for placement decisions and requires further investigation to establish validity across ages.

Contextual and Practical Considerations for Implementation

  • Alignment with IDEA and ED definitions:
    • SAED-3 RS was designed to map directly onto the five ED characteristics in IDEA (inability to learn, relationship problems, inappropriate behavior, unhappiness/depression, physical symptoms/fears) and to include SM as a supplemental indicator at older ages.
  • Role in MTSS/RTI/PBIS frameworks:
    • Serves as a standardized data point for screening and progress monitoring and helps justify decisions regarding supports and interventions.
  • Administration logistics:
    • Requires a teacher who knows the student well (≥2 months) to complete; multiple raters can be used to triangulate information.
  • Usage caveats:
    • Composite index should not be the sole basis for ED eligibility; interpret subscale scores in the context of a broader evaluation (case history, functional assessment, additional measures).
  • Ethical and practice implications:
    • Norming and bias considerations are important; ensure diverse representation in fledgling implementations and be mindful of potential differential item functioning across groups.

Connections to Related Measures and Concepts

  • Other ED-related or behavioral measures commonly used in schools include:
    • BASC-3 (Reynolds & Kamphaus, 2015)
    • SDQ (Goodman, 2001; 2005)
    • STEP-S (Erford et al., 2012)
    • SSIS (Gresham & Elliott, 2008)
  • SAED-3 RS rationale for development:
    • Unlike some instruments, SAED-3 RS explicitly aligns with federal ED criteria, aiming for better relevance to eligibility decisions and more robust normative data tailored to current U.S. student populations.
  • Conceptual alignment with prevention frameworks:
    • The instrument supports MTSS/RTI/PBIS by providing reliable indicators of emotional/behavioral functioning that can inform tiered interventions and program evaluation.

Limitations of the Current Studies and Future Research Directions

  • Aims for future reliability studies:
    • Replicate findings with larger, more diverse samples of students (including more students with ED across elementary, middle, and high school).
    • Increase representation of male educators, educators of diverse racial/ethnic backgrounds, and varying years of experience.
    • Include nested-data designs to account for students nested within classrooms and to explore teacher-level variance.
  • Reliability and validity expansion:
    • Additional reliability studies should quantify stability over longer time periods (longer test–retest intervals) and interrater reliability across broader samples.
    • Convergent validity with other established ED measures and predictive validity for educational outcomes (IEP goals, service provision) should be explored.
  • Practical refinement:
    • Consider developing complementary forms or item banks for younger ages to improve age-appropriate sensitivity, as suggested by age-related differences in internal consistency.

Conclusion and Takeaways

  • The SAED-3 RS demonstrates strong reliability across internal consistency, interrater reliability, and test–retest reliability, supporting its use as a reliable data source in identifying students with ED and informing MTSS/RTI-based interventions and educational planning.
  • While findings are broadly favorable, ongoing research with more diverse samples and validity studies will strengthen its utility and help refine its use in practice.
  • Practitioners should integrate SAED-3 RS results with other data sources and consider the Age- and Subscale-specific implications, especially when using the SM subscale for decision making.

Key Formulas and Thresholds (Quick Reference)

  • Scoring interpretation for core subscales: ext{score}_{subscale} \in [0, 3] \text{ per item; sum to raw subscale score; transform to scaled score}
  • ED thresholds per subscale:
    • Not indicative: \text{scaled score} \le 13
    • Indicative of ED: 14 \le \text{scaled score} \le 16
    • Highly indicative of ED: \text{scaled score} \ge 17
  • SM threshold: scaled score \ge 14 indicates antisocial/delinquent community behavior (91st percentile).
  • Reliability benchmarks (interpretive guidance):
    • Internal consistency: \alpha values \ge 0.80 (minimally reliable); \ge 0.90 (desirable).
    • Interrater reliability: Cohen’s κ values interpreted as poor to very good; values >0.60 typically viewed as good/substantial; values >0.80 as very good.
    • Test–retest reliability: rc values approaching or exceeding 0.80 indicate strong stability over 2 weeks.

References to Key Study Details (for quick recall)

  • Normative sample: 1,430 students; ED sample: 441 students; data collection 2015–2018; poststratification weighting used to align with U.S. census distributions.
  • Internal consistency (Study 1): Normative subscale alphas 0.79\sim0.92; RSI composite \approx 0.96; ED subscales ≥ 0.90; RSI 0.96.
  • Interrater reliability (Study 2): Combined κ values per subscale 0.70\sim0.89; composite \kappa = 0.89; strongest agreement in older groups (12–18).
  • Test–retest reliability (Study 3): Combined rc values subscales 0.64\sim0.84; composite 0.88; minimal practice effects.
  • Supplemental notes: SM reliability by age improves with age; caution advised in using SM for placement decisions before age 13.