SAED-3 RS Reliability Notes

Overview

The SAED-3 RS (Scales for Assessing Emotional Disturbance–3rd Edition Rating Scale) is a 45-item, teacher-completed rating scale designed to align with the federally defined characteristics of Emotional Disturbance (ED) to support eligibility decisions under IDEA.
Purpose of SAED-3 RS reliability studies: provide psychometric evidence that the SAED-3 RS reliably measures emotional/behavioral functioning across diverse students and settings, supporting data-informed decisions in MTSS/RTI frameworks and educational planning.
Context: ED prevalence estimates in the US are high among school-age children; yet ED is underrepresented in IDEA services. Early identification and intervention can improve long-term outcomes; reliable measures are central to data-based decision making in schools.
SAED-3 RS integrates with MTSS/RTI/PBIS approaches to inform screening, progress monitoring, and evaluation of student outcomes.

What the SAED-3 RS Measures and How it Is Used

ED criteria addressed by SAED-3 RS: five core characteristics per IDEA ED definition—
- Inability to learn (IL)
- Relationship problems (RPs)
- Inappropriate behavior (IB)
- Unhappiness or depression (UD)
- Physical symptoms or fears (PS)
Supplemental subscale: Socially Maladjusted (SM) for ages 12–18.
Respondent: teacher familiar with the student for at least 2 months.
Items and scoring:
- 45 items total; 5 core subscales (IL, RP, IB, UD, PS) + 1 supplemental (SM for 12–18).
- 4-point Likert-type scale per item: 0 = ext{not a problem}, \, 1 = ext{mild problem}, \, 2 = ext{considerable problem}, \, 3 = ext{severe problem}.
- Subscale scores are summed and transformed to scaled scores; a composite rating scale index combines core subscales for a global behavioral functioning measure (not used for eligibility).
- Interpretation thresholds per subscale:
- ext{score} \le 13\Rightarrow\text{not indicative of ED}
- 14 \le \text{score} \le 16\Rightarrow\text{Indicative of ED}
- \text{score} \ge 17\Rightarrow\text{Highly indicative of ED}
- SM subscale threshold: a scaled score of 14 or greater on SM (≈ 91st percentile) indicates antisocial/delinquent behavior in the community and may signal needs beyond IDEA.
Norming and reliability emphasis:
- Norms updated to align with U.S. census; bias/generalizability checks; convergent validity with other behavior measures; factor structure confirmation.
Practical use: supports data-informed eligibility decisions, informs IEP goals, and helps identify MTSS/RTI targets and interventions.

Reliability Focus and Evidence Structure

Reliability forms addressed: internal consistency, interrater reliability, and test–retest reliability.
Key analytic approaches:
- Internal consistency: Cronbach’s alpha \alpha for subscales and for the rating-scale index composite (and Guilford’s formula for the composite using subscale intercorrelations).
- Interrater reliability: Cohen’s weighted kappa (κ), with interpretation guidelines (Altman 1991; Landis & Koch 1977).
- Test–retest reliability: Pearson-type correlation across two administrations separated by 14 days, with correction for range effects and interpretation using Hopkins (2002) scales.
Overall finding across studies: SAED-3 RS demonstrates reliable, stable measurement of emotional/behavioral functioning across age 5–18 and across raters over short timeframes.

Study 1: Internal Consistency (Normative vs ED samples)

Design and aim:
- Examine internal reliability (Cronbach’s α) for SAED-3 RS subscales and the RSI composite across ages 5–18 in the normative sample, and for the ED sample.
Data collection and samples:
- Normative sample: 1,430 students aged 5–18 from 23 states; data collected fall 2015–spring 2018.
- ED sample: 441 students aged 5–18 rated in 15 states.
- Poststratification weighting used to adjust for under/overrepresentation to match U.S. census distributions (less than 6% of sample affected).
- Demographic details for normative sample (Table 1: region, gender, race, Hispanic status, family income, parent education, exceptionality status) show alignment with U.S. school-age population with some over-/under-representation in groups; ED sample reflects school-age special education population characteristics (regional distribution, gender, race, Hispanic status).
Subscales and reliability metrics:
- Subscales: IL, RP, IB, UD, PS, SM (only for ages 12–18); RSI = rating scale index (composite).
- Averaged internal consistency across ages:
- Normative sample: subscale alphas ranged from 0.79 to 0.92; RSI composite = 0.96 (overall average across ages). Typical values: IL ≈ 0.92, RP ≈ 0.86, IB ≈ 0.91, UD ≈ 0.83, PS ≈ 0.79, SM ≈ 0.85, RSI ≈ 0.96 (Table 2).
- ED sample: subscale alphas ≥ 0.90 for all subscales; composite alpha = 0.96 (Table 3).
- Across ages (normative sample): internal consistency generally increases with age; younger ages show lower reliability especially for RP, UD, PS; authors recommend potential development of additional items for younger ages or a separate form for younger children.
Interpretive conclusions:
- SAED-3 RS demonstrates strong internal consistency, particularly in older age groups and among ED-identified students; supports use for understanding global ED-related functioning and for informing program decisions.
Practical takeaway:
- For practitioners using SAED-3 RS beyond ED eligibility, internal consistency supports reliability of subscale and composite interpretations, especially in upper elementary through high school.

Study 2: Interrater Reliability

Data collection and sample:
- From SAED-3 norming participants, educators were asked to rate up to 10 students independently on the same day, with unique IDs and protocols.
- Interrater sample: 216 students rated by 123 pairs of educators in South Carolina and Texas; ages 5–18 (M = 11.96, SD = 2.99); demographics: male 72%, White 61%, Hispanic 10%, ED exceptionality ~25% (some with ED). A randomly selected subset (n=31) matched norming demographics was included in the normative sample when possible.
Statistical analysis:
- Interrater reliability assessed with Cohen’s weighted kappa across three age blocks: 5–11 (elementary), 12–14 (middle), 15–18 (high).
- Ratings collapsed into three adoption bands per SAED-3 RS thresholds: not indicative of ED, indicative of ED, highly indicative of ED.
Results (coherent across groups):
- Combined sample (n=216): subscale κ values ranged from .70 to .84 (good to very good) and the composite κ was .89 (very good).
- By age block:
- Ages 5–11 (n=96): IL κ = .75; RP κ = .70; IB κ = .75; UD κ = .46; PS κ = .62; Composite κ = .64.
- Ages 12–14 (n=74): IL κ = .82; RP κ = .88; IB κ = .86; UD κ = .82; PS κ = .79; Composite κ = .92.
- Ages 15–18 (n=46): IL κ = .85; RP κ = .91; IB κ = .95; UD κ = .92; PS κ = .85; Composite κ = .94.
Interpretation and implications:
- Overall, interrater reliability was good to very good across subscales and very good for the composite index, suggesting consistent ratings between educators who know the student and have similar training.
Practical takeaway:
- Supports use of SAED-3 RS in multi-rater contexts, especially for ages 12–18 where interrater agreement is strongest.

Study 3: Test–Retest Reliability

Data collection and sample:
- 117 students rated by the same educators at two time points approximately 14 days apart (Time 1 and Time 2).
- Locations: CO, IA, KS, NC, TX, UT; ages 5–18 (M = 12.10, SD = 3.66); demographics varied (gender, race, ethnicity, exceptionality status).
Statistical analysis:
- Correlations between Time 1 and Time 2 ratings calculated for each subscale and the RSI composite; corrected correlations (rc) are reported, with uncorrected (ru) also shown when useful.
- Subgroups by age: 5–11, 12–14, 15–18; combined sample analyzed as well.
- Effect sizes for practice effects assessed via Cohen’s r; results reported as trivial if small.
Results (combined sample):
- Subscale rc values (corrected): Inability to learn 0.64; Relationship problems 0.81; Inappropriate behavior 0.71; Unhappiness/depression 0.77; Physical symptoms/fears 0.84; Composite 0.88.
- Subgroup trends by age block:
- 5–11: IL 0.64; RP 0.84; IB 0.60; UD 0.74; PS 0.81; RSI 0.88.
- 12–14: IL 0.55; RP 0.82; IB 0.85; UD 0.80; PS 0.84; RSI 0.87.
- 15–18: IL 0.58; RP 0.95; IB 0.88; UD 0.77; PS 0.89; RSI 0.93.
Interpretation and implications:
- Test–retest reliability ranged from moderate to very large across subscales, with the composite consistently very strong (≈0.88–0.93 depending on subgroup).
- The SAED-3 RS demonstrates stability over a short-term window (14 days), supporting its use for establishing baseline behavior and monitoring short-term changes.
Practical takeaway:
- The SAED-3 RS provides reliable short-term stability in scores, allowing educators to track changes over short intervals without large measurement error.

Overall Reliability Synthesis and Practical Implications

Internal consistency:
- Normative: average subscale alphas around 0.86 (median) with many at or above 0.80; RSI composite \approx 0.96.
- ED sample: subscale alphas ≥ 0.90 and composite 0.96, indicating high internal coherence of the scale within ED-identified students.
Interrater reliability:
- Across all ages: subscales \kappa from 0.70 to 0.95, composite \kappa = 0.89; overall good-to-very-good agreement between raters.
Test–retest reliability:
- Across combined sample: subscales from 0.64 to 0.84; composite ≈ 0.88; stability over 14 days is strong, with minimal practice effects.
Practical implications:
- SAED-3 RS demonstrates robust psychometric properties for use in both eligibility determinations and ongoing progress monitoring within MTSS/RTI frameworks.
- When used for eligibility decisions, clinicians should consider SAED-3 RS results alongside other data sources (e.g., case history, functional behavioral assessments, parent ratings) as recommended by practice guidelines.
- The SM subscale should be interpreted with caution in younger students (12–18 age range only) and not used in isolation to drive eligibility decisions for younger children.

Limitations and Cautions

Sampling and generalizability:
- Interrater and test–retest samples were not fully representative; participation was voluntary, introducing potential selection bias.
- Demographic composition skewed towards female, White, non-Hispanic educators with substantial experience; future work should include more diverse educators (gender, race/ethnicity, years of experience).
- ED sample was geographically distributed but the representation across school levels (elementary/middle/high) could be expanded with larger samples.
Nested data considerations:
- Data in some designs are nested (students within classrooms); analyses did not fully model nesting. Future work should account for hierarchical data structures to examine variance components more precisely.
Validity evidence:
- This set focuses on reliability; convergent and discriminant validity with other measures (e.g., BASC-3, SDQ, STEP-S, SSIS) and criterion validity with ED diagnoses warrant further study.
Supplemental SM subscale:
- Reliability for SM improves with age (12–18) but is not recommended as a sole basis for placement decisions and requires further investigation to establish validity across ages.

Contextual and Practical Considerations for Implementation

Alignment with IDEA and ED definitions:
- SAED-3 RS was designed to map directly onto the five ED characteristics in IDEA (inability to learn, relationship problems, inappropriate behavior, unhappiness/depression, physical symptoms/fears) and to include SM as a supplemental indicator at older ages.
Role in MTSS/RTI/PBIS frameworks:
- Serves as a standardized data point for screening and progress monitoring and helps justify decisions regarding supports and interventions.
Administration logistics:
- Requires a teacher who knows the student well (≥2 months) to complete; multiple raters can be used to triangulate information.
Usage caveats:
- Composite index should not be the sole basis for ED eligibility; interpret subscale scores in the context of a broader evaluation (case history, functional assessment, additional measures).
Ethical and practice implications:
- Norming and bias considerations are important; ensure diverse representation in fledgling implementations and be mindful of potential differential item functioning across groups.

Connections to Related Measures and Concepts

Other ED-related or behavioral measures commonly used in schools include:
- BASC-3 (Reynolds & Kamphaus, 2015)
- SDQ (Goodman, 2001; 2005)
- STEP-S (Erford et al., 2012)
- SSIS (Gresham & Elliott, 2008)
SAED-3 RS rationale for development:
- Unlike some instruments, SAED-3 RS explicitly aligns with federal ED criteria, aiming for better relevance to eligibility decisions and more robust normative data tailored to current U.S. student populations.
Conceptual alignment with prevention frameworks:
- The instrument supports MTSS/RTI/PBIS by providing reliable indicators of emotional/behavioral functioning that can inform tiered interventions and program evaluation.

Limitations of the Current Studies and Future Research Directions

Aims for future reliability studies:
- Replicate findings with larger, more diverse samples of students (including more students with ED across elementary, middle, and high school).
- Increase representation of male educators, educators of diverse racial/ethnic backgrounds, and varying years of experience.
- Include nested-data designs to account for students nested within classrooms and to explore teacher-level variance.
Reliability and validity expansion:
- Additional reliability studies should quantify stability over longer time periods (longer test–retest intervals) and interrater reliability across broader samples.
- Convergent validity with other established ED measures and predictive validity for educational outcomes (IEP goals, service provision) should be explored.
Practical refinement:
- Consider developing complementary forms or item banks for younger ages to improve age-appropriate sensitivity, as suggested by age-related differences in internal consistency.

Conclusion and Takeaways

The SAED-3 RS demonstrates strong reliability across internal consistency, interrater reliability, and test–retest reliability, supporting its use as a reliable data source in identifying students with ED and informing MTSS/RTI-based interventions and educational planning.
While findings are broadly favorable, ongoing research with more diverse samples and validity studies will strengthen its utility and help refine its use in practice.
Practitioners should integrate SAED-3 RS results with other data sources and consider the Age- and Subscale-specific implications, especially when using the SM subscale for decision making.

Key Formulas and Thresholds (Quick Reference)

Scoring interpretation for core subscales: ext{score}_{subscale} \in [0, 3] \text{ per item; sum to raw subscale score; transform to scaled score}
ED thresholds per subscale:
- Not indicative: \text{scaled score} \le 13
- Indicative of ED: 14 \le \text{scaled score} \le 16
- Highly indicative of ED: \text{scaled score} \ge 17
SM threshold: scaled score \ge 14 indicates antisocial/delinquent community behavior (91st percentile).
Reliability benchmarks (interpretive guidance):
- Internal consistency: \alpha values \ge 0.80 (minimally reliable); \ge 0.90 (desirable).
- Interrater reliability: Cohen’s κ values interpreted as poor to very good; values >0.60 typically viewed as good/substantial; values >0.80 as very good.
- Test–retest reliability: rc values approaching or exceeding 0.80 indicate strong stability over 2 weeks.

References to Key Study Details (for quick recall)

Normative sample: 1,430 students; ED sample: 441 students; data collection 2015–2018; poststratification weighting used to align with U.S. census distributions.
Internal consistency (Study 1): Normative subscale alphas 0.79\sim0.92; RSI composite \approx 0.96; ED subscales ≥ 0.90; RSI 0.96.
Interrater reliability (Study 2): Combined κ values per subscale 0.70\sim0.89; composite \kappa = 0.89; strongest agreement in older groups (12–18).
Test–retest reliability (Study 3): Combined rc values subscales 0.64\sim0.84; composite 0.88; minimal practice effects.
Supplemental notes: SM reliability by age improves with age; caution advised in using SM for placement decisions before age 13.