Model Assessment Framework (Pakistan) – Comprehensive Study Notes

Preface and High-Level Messages

The Model Assessment Framework (MAF) represents Pakistan’s first unified set of assessment standards for Grades 9 – 12. Developed by the Inter Boards Coordination Commission (IBCC) in consultation with all 29 Examination Boards and multiple ministries, the document seeks to
• standardise board examinations,
• replace rote-driven testing with competency-based, SLO-aligned assessment,
• digitise examination life-cycles, and
• uplift equity, transparency, and international comparability.

Key endorsements are supplied by the Federal Minister for Education, the Special Secretary, and the Executive Director-IBCC, who jointly stress (i) positive wash-back, (ii) higher-order thinking, and (iii) the creation of secure, central Item Banks.

Executive Summary and Chapter Highlights

• Three Phases anchor quality: Pre-Examination, During Examination, Post-Examination. Each phase has minimum standards, digital provisions, and feedback loops.
• Passing threshold climbs from $33\%$ to $40\%$ ; GPA/CGPA now printed on certificates.
• Large-scale wash-back is engineered so teaching shifts from content recall to problem-solving.
• Item Banks, on-screen marking (OSM), psychometrics (CTT & IRT), and public dashboards are mandated for all Boards.

Chapter 1 – Introduction

• Rationale: resolve disparity among Boards, end focus on rote, meet SDG-4 quality targets, and create uniform benchmarks.
• Significance: reduces regional inequity; elevates validity, fairness, data-driven policy, and 21-century skills.
• Objectives: national standards, evidence-based decisions, formative + summative synergy, continuous teacher development.

Chapter 2 – Standards for Assessment

2.1 Concept of Assessment

Continuous process of defining, selecting, collecting, analysing, and interpreting evidence of learning.

2.2 Wash-Back

Positive vs negative effects; alignment with curriculum ensures deep learning. Example: if tests emphasise evaluation, teachers model evaluation tasks.

2.3 Purposes

Diagnostic, formative, summative, and interim; multiple instruments give higher validity.

2.4 Types

• Formative (ongoing, feedback-rich)
• Bloom-based formative framework (cognitive, psychomotor, affective)
• Technology-integrated tools: Google Forms, Kahoot!, Flipgrid, Jamboard, Seesaw.
• Summative (board exams) – benchmark attainment vs SLOs.

2.5 Minimum Quality Standards

A “good test” is reliable, valid, objective, comprehensive, practical. Eleven explicit standards cover alignment, Bloom coverage, ToS usage, validity $r\ge 0.3$ , reliability $\alpha\ge 0.7$ , fairness, digitisation, GPA rules, and stakeholder feedback.

Assessment Blueprint Skeleton

Type	# Items	%	Weight
Formative MCQ	•	•	•
Summative ERQ	•	•	•

2.6 Digitisation Standards

Standard 11 (Pre), 12 (During), 13 (Post) digitise item creation, real-time proctoring, automated scoring, dashboards.

Chapter 3 – Curriculum Alignment

• National Curriculum Framework targets “successful learners, confident individuals, responsible citizens, effective contributors.”
• Standards → Benchmarks → Learning Outcomes; each SLO mapped to assessment items via Table of Specification (ToS).

Chapter 4 – Assessment Development Process

Item writing principles: clarity, one focus, bias-free, Bloom-balanced.
Review pipeline: draft → peer review → expert review → pilot → statistical analysis → approval.
Moderation rules: item difficulty $0.3\le p \le 0.7$ ; discrimination D>0.2.
Illustrative bad vs improved items emphasize context-rich, higher-order questioning.

Chapter 5 – Conduct of Examination

• Professional training (10 hr annually) for all supervisory staff; booklet of duties.
• Conducive environment: ergonomic halls, light, silence, no external interference.
• Transparency: random appointment, CCTV, e-surveillance, secure logistics via sealed packets and trusted couriers.
• Cheating counter-measures: biometric ID, signal jammers, AI proctoring, statistical flagging, zero-tolerance.

Chapter 6 – Coding, Marking & Results Compilation

• Dual-coding: fictitious numbers replace roll-numbers; bundles of 250 scripts.
• Multi-layer marking: head examiner, checker, subject coordinator, super-checker.
• On-Screen Marking benefits: auto-totalling, remote access, parallel moderation, $+$ reliability.
• Final compilation uses software to merge marks, UFM decisions, and absentee data.

Chapter 7 – Post-Exam Analysis

7.1 Purposes

Improve learning, inform instruction, enhance assessment quality, communicate to stakeholders.

7.2 Data Collection

Raw scores, item responses, demographic/context variables; integrity checks.

7.3 Item Analysis Theories

Classical Test Theory vs Item Response Theory; parameters: difficulty, discrimination, guessing.

7.4 Exam-Results Dashboard

Interactive charts, subgroup filters, heat-maps; supports policy briefs, school feedback, student-parent reports.

7.5 Statistics Toolkit

• Descriptive: mean, median, mode, $\sigma$ .
• Inferential: t-test, ANOVA, correlation, regression $y=\beta<em>0+\beta</em>1x$ , logistic models.
• Predictive modelling for at-risk students.

7.8 Continuous Improvement

Curriculum adjustment, targeted PD, refined item banks, evidence-driven resource allocation.

Chapter 8 – Key IBCC Policy Decisions

01 Passing mark raised to $40\%$ .
02 Grace marks limited to 2nd-attempt (≤7 marks, one subject).
03 SLO-aligned syllabi & teacher training mandated.
04 Textbooks revised yearly to defeat guidebooks.
05 Uniform exam standards (scheme, cognitive weights).
06 Shift to higher-order: Knowledge 30 %, Understanding 50 %, Application 20 %.
07 Capacity building for item writers, markers.
08 Central & provincial Item Banks; ≥20 % of every paper drawn from IBCC central bank.
09 Theory & practical split; separate passing.
10 Practical includes MCQ-written + lab assessment by subject teacher.
11 Research & Development units at each Board; compulsory annual studies.
12 New GPA/CGPA-based grading (see below).

Chapter 9 – Guidelines & SOPs

9.1 Item Writers

Alignment, Bloom balance, context relevance, pilot testing.

9.2 Reviewers

Check SLO alignment, bias, clarity, cognitive distribution; approve/reject.

9.3 Moderators

Ensure test-wide balance, fairness, timing, security; produce moderation report.

9.4 Test Assembly

Logical sequencing, uniform formatting, time flags, final approval.

9.5 Test Administration

Material control, trained invigilators, accessibility, emergency protocols.

9.6 Marking/Coding

Rubric adherence, double marking, recording accuracy, appeal mechanism.

Grading System & GPA Formulae

New five-point scale (A++ to U):
• A++ ≥95 = GPA 5.0
• C 60–69 = GPA 3.0
• E 40–49 = GPA 1.0
• U <40 = 0

Example (Grade 11):
$\text{GPA}=\frac{\sum GP}{\text{courses}}=\frac{20.5}{7}=2.92$
Grade 12 GPA $4.92$ → $\text{CGPA}=\frac{2.92+4.92}{2}=3.92$

Digitisation Standards Recap

Pre-exam: online curriculum-SLO mapping, digital ToS, secure item bank.
During-exam: e-paper delivery, CCTV, e-attendance, biometric.
Post-exam: OSM, automated stats, e-certificates, public dashboards.

Capacity Building & Training

Annual 10-hour programmes; phase-wise (pre, during, post) training matrices; IBCC to fund AI-enhanced QIB development.

Resources

IBCC portal hosts subject-wise frameworks, ToS, SLO lists, model papers, and formative-summative guidelines: https://ibcc.edu.pk/MAF/

Figures & Tables Reference Guide

Fig-1: formative vs summative; Fig-4: Bloom pyramid; Tables 1–14 cover blueprint, validity standards, GPA grid, etc.

Lead Contributors (selected)

Dr Ghulam Ali Mallah (Executive Director IBCC, digitalisation champion); Dr Sajid Ali Yousuf Zai (psychometrics, dashboards); Dr Muhammad Shafique (SLO examinations), Mirza Ali (FBISE research), plus directors from PIE, FDE, PEC and provincial boards.

Preface and High-Level Messages

The Model Assessment Framework (MAF) represents Pakistan’s first unified set of assessment standards for Grades 9 – 12. Developed collaboratively by the Inter Boards Coordination Commission (IBCC) in consultation with all 29 Examination Boards and multiple ministries, this comprehensive document seeks to achieve several critical objectives:

Standardise Board Examinations: Establish consistent quality and structure across all examination boards, ensuring uniformity in assessment practices.
Replace Rote-Driven Testing with Competency-Based, SLO-Aligned Assessment: Shift focus from mere memorization to evaluating students' understanding, application, and higher-order thinking skills, directly linked to Student Learning Outcomes (SLOs).
Digitise Examination Life-Cycles: Implement digital solutions across all stages of the examination process, from item creation to results compilation, enhancing efficiency and security.
Uplift Equity, Transparency, and International Comparability: Foster a fairer and more transparent assessment system that is comparable with international standards, reducing regional disparities and promoting global recognition of Pakistani qualifications.

Key endorsements are supplied by the Federal Minister for Education, the Special Secretary, and the Executive Director-IBCC, who jointly stress the importance of various outcomes:

(i) Positive Wash-Back: Encourage teaching and learning practices that promote deeper understanding and critical thinking rather than simple recall.
(ii) Higher-Order Thinking: Emphasize the assessment of analytical, problem-solving, and evaluative skills.
(iii) Creation of Secure, Central Item Banks: Develop centralized repositories of high-quality, pre-vetted examination questions to ensure consistency and security.

Executive Summary and Chapter Highlights

Three Phases Anchor Quality: The framework is structured around three critical phases to ensure quality control throughout the examination process: Pre-Examination (covering planning and item development), During Examination (focusing on secure conduct), and Post-Examination (encompassing marking, analysis, and results). Each phase comes with detailed minimum standards, provisions for digital integration, and mechanisms for continuous feedback loops to ensure ongoing improvement.
Passing Threshold Climbs: The minimum passing threshold has been raised from $33\%$ to $40\%$ to ensure a higher standard of competency. Additionally, certifications will now prominently display Grading Point Average (GPA) and Cumulative Grading Point Average (CGPA) to provide a more holistic view of student performance.
Large-Scale Wash-Back is Engineered: The MAF is designed to induce a significant shift in teaching methodologies, moving pedagogy away from simple content recall and towards developing students' problem-solving abilities and critical thinking skills.
Key Digital Mandates for All Boards: All Examination Boards are mandated to adopt core digital tools and practices, including the establishment of Item Banks (secure digital repositories of questions), On-Screen Marking (OSM) (digital evaluation of answer scripts), utilization of Psychometrics (statistical analysis of test data using Classical Test Theory (CTT) & Item Response Theory (IRT) to ensure test quality), and the implementation of Public Dashboards (interactive platforms for transparent data dissemination and policy insights).

Chapter 1 – Introduction

Rationale: The MAF was conceived to address significant challenges within the Pakistani education system, including resolving disparity among various examination boards, ending the prevalent focus on rote learning, meeting quality targets outlined in SDG-4 (United Nations Sustainable Development Goal 4 on Quality Education), and establishing uniform national benchmarks for student assessment.
Significance: Its implementation is expected to profoundly impact the educational landscape by reducing regional inequity in assessment standards; significantly elevating validity, fairness, and reliability of examinations; enabling data-driven policy decisions; and fostering the development of essential 21st-century skills such as critical thinking, communication, and collaboration.
Objectives: The primary objectives include establishing national assessment standards, promoting evidence-based decisions in education policy, fostering synergy between formative and summative assessments for continuous learning, and ensuring continuous professional development for teachers to align their instructional practices with the new assessment paradigm.

Chapter 2 – Standards for Assessment

2.1 Concept of Assessment

Assessment is defined as a continuous, systematic process involving several interconnected stages: defining the learning outcomes to be measured, selecting appropriate assessment methods, collecting evidence of student learning through various instruments, analysing the collected data, and interpreting the evidence to make informed judgments about student progress and program effectiveness.

2.2 Wash-Back

Wash-back refers to the impact of assessment on teaching and learning. It can be positive, encouraging desirable instructional practices (e.g., if tests emphasize critical thinking, teachers will design activities that promote it), or negative, leading to undesirable practices (e.g., teaching to the test, focusing solely on memorization for multiple-choice questions). The MAF promotes strong alignment with the curriculum to ensure that assessments drive deep learning and higher-order cognitive processes. For example, if tests explicitly emphasise evaluation skills, teachers will naturally integrate and model evaluation tasks in their classroom instruction.

2.3 Purposes

Assessments serve multiple purposes throughout the learning cycle:

Diagnostic: Administered before instruction to identify students' prior knowledge, strengths, and weaknesses to inform initial teaching plans.
Formative: Ongoing assessments during instruction to monitor student learning and provide immediate feedback for adjustment of teaching and learning strategies.
Summative: Conducted at the end of a unit or course to evaluate overall learning attainment against established standards (e.g., board exams).
Interim: Periodic assessments bridging formative and summative, used to monitor progress towards learning goals over a longer period.

Utilizing multiple assessment instruments (e.g., essays, practical demonstrations, presentations, short-response questions, alongside multiple-choice questions) significantly enhances the validity of the overall assessment, providing a more comprehensive picture of student capabilities.

2.4 Types

Formative (ongoing, feedback-rich): These assessments are integrated throughout the instructional process, providing real-time feedback to students and teachers. Their primary goal is to guide and improve learning, not just to evaluate it. Examples include quizzes, classroom discussions, assignments, and peer reviews.
Bloom-based formative framework (cognitive, psychomotor, affective): Formative assessments should align with Bloom's Taxonomy, covering various domains of learning:
- Cognitive domain: Focuses on intellectual skills and knowledge (e.g., remembering, understanding, applying, analyzing, evaluating, creating).
- Psychomotor domain: Relates to physical skills and coordination (e.g., performing a lab experiment, drawing a diagram).
- Affective domain: Involves emotions, attitudes, values, and interests (e.g., demonstrating teamwork, showing empathy).
Technology-integrated tools: Modern formative assessment leverages digital platforms for efficiency and engagement, such as Google Forms for quick quizzes, Kahoot! for interactive reviews, Flipgrid for video discussions, Jamboard for collaborative brainstorming, and Seesaw for digital portfolios and parent communication.
Summative (board exams): These are high-stakes assessments conducted at the end of a significant learning period (e.g., annual or matriculation exams). Their purpose is to benchmark student attainment against predefined Student Learning Outcomes (SLOs) and provide a final evaluation of learning.

2.5 Minimum Quality Standards

A “good test” adheres to several core principles to be effective and fair:

Reliable: Consistently produces similar results under similar conditions (e.g., if re-administered to the same student, similar scores would be obtained).
Valid: Measures what it is intended to measure (e.g., a science test genuinely assesses scientific knowledge and skills, not just reading comprehension).
Objective: Free from personal bias of the scorer; scoring criteria are clear and consistent.
Comprehensive: Covers the entire scope of the curriculum or learning objectives being assessed.
Practical: Feasible to administer, score, and interpret within available resources (time, cost, personnel).

Eleven explicit standards are outlined to ensure assessment quality, covering various aspects:

Alignment: Ensuring assessment items are directly linked to Student Learning Outcomes (SLOs) and curriculum content.
Bloom Coverage: Requiring assessments to include items across different cognitive levels of Bloom's Taxonomy (e.g., Knowledge, Understanding, Application, Analysis, Evaluation, Creation).
ToS Usage: Mandating the use of a Table of Specification (ToS) to ensure balanced coverage of content and cognitive levels.
Validity ( $r ext{ (correlation coefficient)}\ge 0.3$ ): A statistical measure indicating a reasonable correlation between test scores and an external criterion (e.g., future performance or other valid measures).
Reliability ( $ext{Cronbach's alpha}\ge 0.7$ ): A statistical measure of internal consistency, indicating that the items on a test are measuring the same construct reliably.
Fairness: Ensuring assessments are free from bias towards any group and provide equitable opportunities for all students.
Digitisation: Requiring the integration of digital tools and processes in assessment design, administration, and scoring.
GPA Rules: Standardizing the calculation and reporting of Grade Point Average (GPA).
Stakeholder Feedback: Incorporating feedback from students, teachers, parents, and examination boards for continuous improvement.

Assessment Blueprint Skeleton

Type	Minimum # Items	Typical % Coverage	Weight in Final Grade	Difficulty Range	Bloom Level Range
Formative MCQ	Dependent on scope	Varies	0	Easy to Medium	K-U-AP
Summative ERQ	5-10	High	Significant	Medium to Hard	AN-EV-CR

*Note: ERQ = Extended Response Questions; MCQ = Multiple Choice Questions; K=Knowledge, U=Understanding, AP=Application, AN=Analysis, EV=Evaluation, CR=Creation.

2.6 Digitisation Standards

Standard 11 (Pre), 12 (During), and 13 (Post) outline comprehensive digitisation requirements across the examination lifecycle:

Standard 11 (Pre-Examination): Focuses on digitalizing item creation and management, including online curriculum-SLO mapping, digital Table of Specification (ToS) generation, and the establishment of secure, cloud-based Item Banks.
Standard 12 (During Examination): Mandates real-time proctoring through CCTV and AI, e-attendance systems, biometric identification of candidates, and secure e-paper delivery mechanisms.
Standard 13 (Post-Examination): Requires automated scoring for objective items, digital dashboards for real-time performance monitoring, and electronic result processing.

Chapter 3 – Curriculum Alignment

Curriculum alignment is foundational to effective assessment. The National Curriculum Framework targets the development of well-rounded individuals: “successful learners, confident individuals, responsible citizens, effective contributors.”

This alignment is achieved through a structured hierarchy:

Standards: Broad statements of what students should know and be able to do.
Benchmarks: More specific learning goals within each standard.
Student Learning Outcomes (SLOs): Observable and measurable statements of what a student is expected to know or be able to do at the end of a learning period.

Crucially, each SLO is explicitly mapped to specific assessment items via a detailed Table of Specification (ToS). This ensures that every test question directly measures a defined learning outcome, promoting instructional validity and guiding both teaching and assessment processes.

Chapter 4 – Assessment Development Process

Item Writing Principles: To ensure high-quality assessment items, writers must adhere to strict principles:
- Clarity: Questions must be unambiguous and easily understood by all students.
- One Focus: Each item should assess a single concept or skill to avoid confusion and allow for precise measurement.
- Bias-Free: Questions should be culturally sensitive, fair, and free from any biases related to gender, socioeconomic status, or background.
- Bloom-Balanced: Items should cover a range of cognitive levels according to Bloom's Taxonomy, encouraging higher-order thinking rather than just recall.
Review Pipeline: A rigorous review process ensures the quality and validity of all assessment items:
- Draft: Initial creation of items by trained writers.
- Peer Review: Items are reviewed by fellow item writers for initial clarity and adherence to principles.
- Expert Review: Subject matter experts and assessment specialists review items for content accuracy, alignment with SLOs, and cognitive level.
- Pilot: A sample of items is administered to a small group of students (similar to the target population) to gather preliminary data on difficulty and clarity.
- Statistical Analysis: Data from the pilot test is analyzed using psychometric techniques (e.g., Item Response Theory) to evaluate item difficulty, discrimination, and effectiveness of distractors.
- Approval: Final items are approved for inclusion in the secure Item Bank.
Moderation Rules: To maintain consistency and quality across tests:
- Item Difficulty ( $0.3\le p \le 0.7$ ): The proportion of students who answer an item correctly should ideally fall within this range, indicating it's neither too easy nor too difficult.
- Discrimination (D>0.2): A measure of how well an item differentiates between high-scoring and low-scoring students. A higher D-value indicates better discrimination.
Illustrative Bad vs. Improved Items: The framework provides clear examples of poorly constructed items (e.g., ambiguous, multiple correct answers, solely recall-based) contrasted with improved versions that emphasize context-rich scenarios and require higher-order questioning (e.g., analysis, evaluation, problem-solving), promoting deeper learning.

Chapter 5 – Conduct of Examination

Effective execution of examinations is paramount to ensure fairness and integrity:

Professional Training (10 hr annually): All supervisory staff (invigilators, superintendents) undergo mandatory annual training of at least 10 hours covering exam protocols, ethical guidelines, and crisis management. A detailed booklet of duties is provided to each staff member.
Conducive Environment: Examination halls must be ergonomic, well-lit, quiet, and free from any external interference or distractions to ensure optimal student performance.
Transparency: Measures include random appointment of supervisory staff, installation of CCTV cameras, and e-surveillance systems. Secure logistics involve the use of tamper-proof sealed packets for exam papers and reliance on trusted couriers for distribution and collection.
Cheating Counter-Measures: A robust system to combat academic dishonesty includes biometric identification for candidates, signal jammers to prevent electronic communication, AI proctoring solutions for real-time monitoring, statistical flagging of suspicious answer patterns, and a strict zero-tolerance policy against any form of cheating, leading to severe penalties.

Chapter 6 – Coding, Marking & Results Compilation

This chapter details the secure and efficient processing of examination scripts:

Dual-Coding: To ensure anonymity and impartiality during marking, original roll-numbers are replaced with fictitious, unique numbers. Scripts are often processed in bundles of 250 for logistical efficiency and security.
Multi-Layer Marking: A phased marking process is implemented to ensure accuracy and consistency:
- Head Examiner: Oversees a team of markers.
- Checker: Re-marks a sample of scripts to ensure accuracy and adherence to rubrics.
- Subject Coordinator: Provides overall quality assurance for a specific subject.
- Super-Checker: Performs a final review of marked scripts.
On-Screen Marking (OSM) Benefits: Digital marking offers significant advantages:
- Auto-Totalling: Automatic calculation of marks, eliminating arithmetic errors.
- Remote Access: Markers can evaluate scripts from various locations, increasing flexibility.
- Parallel Moderation: Multiple markers can simultaneously mark the same script or portions of it, enhancing consistency and reliability.
- Increased Reliability: Digital tools reduce human error and facilitate standardized application of rubrics.
Final Compilation: A sophisticated software system is used to merge marks from all subjects, incorporate decisions from Unfair Means (UFM) cases, and process absentee data to generate comprehensive and accurate results.

Chapter 7 – Post-Exam Analysis

7.1 Purposes

Post-exam analysis serves multiple critical purposes:

Improve Learning: Identifies areas where students struggled, informing targeted interventions and revisions in classroom instruction.
Inform Instruction: Provides teachers with data on the effectiveness of their teaching strategies and specific curriculum areas needing more focus.
Enhance Assessment Quality: Insights into item performance (difficulty, discrimination) lead to refinement of future assessment items and overall test design.
Communicate to Stakeholders: Provides transparent data to students, parents, schools, and policymakers regarding performance patterns and educational trends.

7.2 Data Collection

Robust analysis relies on comprehensive data collection, including:

Raw scores: Total marks obtained by each student.
Item responses: Detailed records of how each student answered every question.
Demographic/context variables: Information such as school type, region, gender, which can be correlated with performance for equity analysis.

Rigorous integrity checks are performed on all collected data to ensure accuracy and reliability before analysis.

7.3 Item Analysis Theories

Two primary psychometric theories are employed for detailed item analysis:

Classical Test Theory (CTT): A traditional approach that focuses on observed scores and considers test error. It provides item-level statistics such as difficulty (p-value) and discrimination (D-index).
Item Response Theory (IRT): A more advanced, modern approach that models the relationship between a person's ability and their probability of answering an item correctly. It provides item parameters that are independent of the specific sample of test-takers.
- Parameters: Key parameters derived from item analysis include:
  - Difficulty: How easy or hard an item is for the test-takers.
  - Discrimination: How well an item differentiates between high-ability and low-ability test-takers.
  - Guessing: In IRT, an additional parameter that estimates the probability of low-ability test-takers guessing the correct answer.

7.4 Exam-Results Dashboard

A central, interactive exam-results dashboard is mandated, featuring:

Interactive charts: Visual representations of performance trends (e.g., subject-wise average scores over time).
Subgroup filters: Ability to break down data by various demographics (e.g., gender, district, school type) to identify performance gaps.
Heat-maps: Visualizations showing areas of high and low performance, helping to pinpoint struggling schools or regions.

This dashboard supports the generation of data-driven policy briefs, provides direct school feedback for improvement, and facilitates detailed student-parent reports.

7.5 Statistics Toolkit

A comprehensive statistical toolkit is essential for in-depth analysis:

Descriptive statistics: Summarize basic features of the data:
- Mean: Average score.
- Median: Middle score when data is ordered.
- Mode: Most frequently occurring score.
- Standard Deviation ( $ext{sigma} ext{ (}\boldsymbol{\sigma} ext{)}$ ): A measure of the dispersion or spread of scores around the mean.
Inferential statistics: Allow for drawing conclusions about a larger population based on sample data:
- t-test: Compares the means of two groups.
- ANOVA (Analysis of Variance): Compares the means of three or more groups.
- Correlation: Measures the strength and direction of a linear relationship between two variables.
- Regression ( $y=\boldsymbol{\beta}0+\boldsymbol{\beta}1x$ ): Models the relationship between a dependent variable (y) and one or more independent variables (x), allowing for prediction.
- Logistic models: Used when the dependent variable is binary (e.g., pass/fail).
Predictive modelling: Advanced techniques to forecast academic performance or identify at-risk students who may need early intervention.

7.8 Continuous Improvement

The insights gained from post-exam analysis are crucial for a cycle of continuous improvement, leading to:

Curriculum adjustment: Modifying curriculum content or emphasis based on areas of consistent student difficulty.
Targeted professional development (PD): Designing training programs for teachers to address specific instructional weaknesses or new assessment demands.
Refined item banks: Improving the quality and variety of questions in the item banks based on statistical performance.
Evidence-driven resource allocation: Directing resources (e.g., funding, teaching materials) to schools or regions most in need based on data.

Chapter 8 – Key IBCC Policy Decisions

The Inter Boards Coordination Commission (IBCC) has made several pivotal policy decisions to overhaul the assessment system:

Passing mark raised to $40\%$ : A significant increase from the previous $33\%$ to ensure a higher baseline of competency and discourage rote learning.
Grace marks limited: Grace marks are strictly limited to second-attempt examinations, with a maximum of $\le 7$ marks allowed in only one subject, to prevent undue inflating of results in the first attempt.
SLO-aligned syllabi & teacher training mandated: All syllabi must be revamped to clearly define Student Learning Outcomes, and extensive teacher training programs are compulsory to ensure alignment between instruction and assessment.
Textbooks revised yearly to defeat guidebooks: Textbooks will undergo annual revisions to minimize the effectiveness of

Slide 1: Preface and High-Level Messages

Model Assessment Framework (MAF)
: Pakistan’s first unified set of assessment standards for Grades 9 – 12.

Developed by IBCC
: Created collaboratively by the Inter Boards Coordination Commission (IBCC) in consultation with all 29 Examination Boards and multiple ministries.

Slide 2: Core Purpose of MAF

Key Objectives
: The document seeks to achieve:

Standardise Board Examinations
: Establish consistent quality and structure across all examination boards.

Replace Rote-Driven Testing with Competency-Based, SLO-Aligned Assessment
: Shift focus from memorisation to evaluating understanding, application, and higher-order thinking skills, directly linked to Student Learning Outcomes (SLOs).

Digitise Examination Life-Cycles
: Implement digital solutions across all stages, from item creation to results compilation, enhancing efficiency and security.

Uplift Equity, Transparency, and International Comparability
: Foster a fairer and more transparent assessment system comparable with international standards.

Slide 3: Endorsements and Key Focus Areas

Endorsements Stress
: Key endorsements from the Federal Minister for Education, Special Secretary, and Executive Director-IBCC jointly highlight:

(i) Positive Wash-Back
: Encourage teaching and learning practices that promote deeper understanding and critical thinking.

(ii) Higher-Order Thinking
: Emphasise the assessment of analytical, problem-solving, and evaluative skills.

(iii) Creation of Secure, Central Item Banks
: Develop centralised repositories of high-quality, pre-vetted examination questions to ensure consistency and security.

Slide 4: Executive Summary – Quality Anchor Phases

Three Quality Anchor Phases
: The framework anchors quality through three critical phases:

Pre-Examination
: Covers planning, item development, and secure material preparation.

During Examination
: Focuses on secure conduct, proctoring, and fair administration.

Post-Examination
: Encompasses efficient marking, detailed analysis, and transparent results compilation.

Each phase has minimum standards, digital provisions, and feedback loops for continuous improvement.

Slide 5: Executive Summary – Key Policy Changes

Passing Threshold Climbs
: The minimum passing threshold has been raised from $33\%$ to $40\%$ to ensure a higher standard of competency.

GPA/CGPA (Grading Point Average/Cumulative Grading Point Average) are now prominently printed on certificates for a holistic view of student performance.

Large-Scale Wash-Back is Engineered
: The MAF is designed to induce a significant shift in teaching methodologies.

This encourages pedagogy to move away from simple content recall and towards developing students' problem-solving abilities and critical thinking skills.

Slide 6: Executive Summary – Digital Mandates

Mandated for all Boards
: All Examination Boards are mandated to adopt core digital tools and practices:

Item Banks
: Secure digital repositories of examination questions.

On-Screen Marking (OSM)
: Digital evaluation of answer scripts for efficiency and accuracy.

Psychometrics
: Statistical analysis of test data using Classical Test Theory (CTT) & Item Response Theory (IRT) to ensure test quality.

Public Dashboards
: Interactive platforms for transparent data dissemination and policy insights.

Slide 7: Chapter 1 – Introduction Rationale

Rationale
: The MAF was conceived to address significant challenges:

Resolve disparity among various Examination Boards.

End the prevalent focus on rote learning.

Meet quality targets outlined in SDG-4 (United Nations Sustainable Development Goal 4 on Quality Education).

Establish uniform national benchmarks for student assessment.

Slide 8: Chapter 1 – Introduction Significance

Significance
: Its implementation is expected to profoundly impact the educational landscape by:

Reducing regional inequity in assessment standards.

Significantly elevating validity, fairness, and reliability of examinations.

Enabling data-driven policy decisions.

Fostering the development of essential 21st-century skills (e.g., critical thinking, communication, collaboration).

Slide 9: Chapter 1 – Introduction Objectives

Objectives
: The primary objectives include:

Establishing national assessment standards.

Promoting evidence-based decisions in education policy.

Fostering synergy between formative and summative assessments for continuous learning.

Ensuring continuous professional development for teachers to align instructional practices with the new assessment paradigm.

Slide 10: Chapter 2 – Standards for Assessment: 2.1 Concept of Assessment

2.1 Concept of Assessment
:

Assessment is defined as a continuous, systematic process involving several interconnected stages: defining, selecting, collecting, analysing, and interpreting evidence of learning.

This systematic approach ensures ongoing monitoring and improvement of educational outcomes.

Slide 11: Chapter 2 – Standards for Assessment: 2.2 Wash-Back

2.2 Wash-Back
:

Refers to the impact of assessment on teaching and learning.

Positive effects
: When assessments encourage desirable instructional practices (e.g., teaching for deep understanding and critical thinking).

Negative effects
: When assessments lead to undesirable practices (e.g., teaching to the test, excessive rote memorisation).

Alignment with curriculum ensures deep learning
: The MAF promotes strong alignment to ensure assessments drive higher-order cognitive processes.

Example
: If tests explicitly emphasise evaluation skills, teachers will naturally integrate and model evaluation tasks in their classroom instruction.

Slide 12: Chapter 2 – Standards for Assessment: 2.3 Purposes (Part 1)

2.3 Purposes
: Assessments serve multiple purposes throughout the learning cycle:

Diagnostic
: Administered before instruction to identify students' prior knowledge, strengths, and weaknesses to inform initial teaching plans.

Formative
: Ongoing assessments during instruction to monitor student learning and provide immediate feedback for adjustment of teaching and learning strategies.

Summative
: Conducted at the end of a unit or course to evaluate overall learning attainment against established standards (e.g., board exams).

Slide 13: Chapter 2 – Standards for Assessment: 2.3 Purposes (Part 2)

Interim
: Periodic assessments bridging formative and summative, used to monitor progress towards learning goals over a longer period.

Utilising multiple assessment instruments (e.g., essays, practical demonstrations, presentations, short-response questions, alongside multiple-choice questions) significantly enhances the validity of the overall assessment, providing a more comprehensive picture of student capabilities.

Slide 14: Chapter 2 – Standards for Assessment: 2.4 Types – Formative

2.4 Types
:

Formative (ongoing, feedback-rich)
: These assessments are integrated throughout the instructional process, providing real-time feedback to students and teachers. Their primary goal is to guide and improve learning, not just to evaluate it.

Examples
: Quizzes, classroom discussions, assignments, and peer reviews.

Slide 15: Chapter 2 – Standards for Assessment: 2.4 Types – Bloom-Based Framework

Bloom-based formative framework (cognitive, psychomotor, affective)
: Formative assessments should align with Bloom's Taxonomy, covering various domains of learning:

Cognitive domain
: Focuses on intellectual skills and knowledge (e.g., remembering, understanding, applying, analyzing, evaluating, creating).

Psychomotor domain
: Relates to physical skills and coordination (e.g., performing a lab experiment, drawing a diagram).

Affective domain
: Involves emotions, attitudes, values, and interests (e.g., demonstrating teamwork, showing empathy).

Slide 16: Chapter 2 – Standards for Assessment: 2.4 Types – Technology Tools

Technology-integrated tools
: Modern formative assessment leverages digital platforms for efficiency and engagement.

Examples
: Google Forms for quick quizzes, Kahoot! for interactive reviews,

Flipgrid for video discussions, Jamboard for collaborative brainstorming, and Seesaw for digital portfolios and parent communication.

Slide 17: Chapter 2 – Standards for Assessment: 2.4 Types – Summative

Summative (board exams)
: These are high-stakes assessments conducted at the end of a significant learning period (e.g., annual or matriculation exams).

Purpose
: To benchmark student attainment against predefined Student Learning Outcomes (SLOs) and provide a final evaluation of learning.

Slide 18: Chapter 2 – Standards for Assessment: 2.5 Minimum Quality Standards – Core Principles

2.5 Minimum Quality Standards
: A “good test” adheres to several core principles to be effective and fair:

Reliable
: Consistently produces similar results under similar conditions.

Valid
: Measures what it is intended to measure.

Objective
: Free from personal bias of the scorer; scoring criteria are clear and consistent.

Comprehensive
: Covers the entire scope of the curriculum or learning objectives being assessed.

Practical
: Feasible to administer, score, and interpret within available resources (time, cost, personnel).

Slide 19: Chapter 2 – Standards for Assessment: 2.5 Minimum Quality Standards – Explicit Standards (Part 1)

Explicit Standards (Part 1)
: Eleven explicit standards are outlined to ensure assessment quality, covering various aspects:

Alignment
: Ensuring assessment items are directly linked to Student Learning Outcomes (SLOs) and curriculum content.

Bloom Coverage
: Requiring assessments to include items across different cognitive levels of Bloom's Taxonomy.

ToS Usage
: Mandating the use of a Table of Specification (ToS) to ensure balanced coverage of content and cognitive levels.

Validity ( $r_{\text{correlation coefficient}}\ge 0.3$ )
: A statistical measure indicating a reasonable correlation between test scores and an external criterion.

Slide 20: Chapter 2 – Standards for Assessment: 2.5 Minimum Quality Standards – Explicit Standards (Part 2)

Explicit Standards (Part 2)
:

Reliability ( $\alpha_{\text{Cronbach's alpha}}\ge 0.7$ )
: A statistical measure of internal consistency.

Fairness
: Ensuring assessments are free from bias towards any group and provide equitable opportunities for all students.

Digitisation
: Requiring the integration of digital tools and processes in assessment.

GPA Rules
: Standardizing the calculation and reporting of Grade Point Average (GPA).

Stakeholder Feedback
: Incorporating feedback from students, teachers, parents, and examination boards for continuous improvement.

Slide 21: Chapter 2 – Standards for Assessment: 2.6 Digitisation Standards (Pre-Examination)

2.6 Digitisation Standards
: Standard 11, 12, and 13 outline comprehensive digitisation requirements across the examination lifecycle.

Standard 11 (Pre-Examination)
: Focuses on digitalizing item creation and management, including:

Online curriculum-SLO mapping.

Digital Table of Specification (ToS) generation.

Establishment of secure, cloud-based Item Banks.

Slide 22: Chapter 2 – Standards for Assessment: 2.6 Digitisation Standards (During Examination)

Standard 12 (During Examination)
: Mandates digital conduct during exams, covering:

Real-time proctoring through CCTV and AI.

E-attendance systems.

Biometric identification of candidates.

Secure e-paper delivery mechanisms.

Slide 23: Chapter 2 – Standards for Assessment: 2.6 Digitisation Standards (Post-Examination)

Standard 13 (Post-Examination)
: Requires automated post-exam processes, including:

Automated scoring for objective items.

Digital dashboards for real-time performance monitoring.

Electronic result processing and e-certificates.

Slide 24: Chapter 3 – Curriculum Alignment: Importance and Framework Targets

Curriculum alignment is foundational to effective assessment. The National Curriculum Framework targets the development of well-rounded individuals:

“successful learners, confident individuals, responsible citizens, effective contributors.”

Slide 25: Chapter 3 – Curriculum Alignment: Hierarchy and Mapping

This alignment is achieved through a structured hierarchy:

Standards
: Broad statements of what students should know and be able to do.

Benchmarks
: More specific learning goals within each standard.

Student Learning Outcomes (SLOs)
: Observable and measurable statements of what a student is expected to know or be able to do at the end of a learning period.

Crucially, each SLO is explicitly mapped to specific assessment items via a detailed Table of Specification (ToS). This ensures that every test question directly measures a defined learning outcome.

Slide 26: Chapter 4 – Assessment Development Process: 1. Item Writing Principles

Item Writing Principles
: To ensure high-quality assessment items, writers must adhere to strict principles:

Clarity
: Questions must be unambiguous and easily understood by all students.

One Focus
: Each item should assess a single concept or skill.

Bias-Free
: Questions should be culturally sensitive, fair, and free from any biases.

Bloom-Balanced
: Items should cover a range of cognitive levels according to Bloom's Taxonomy.

Slide 27: Chapter 4 – Assessment Development Process: 2. Review Pipeline

Review Pipeline
: A rigorous review process ensures the quality and validity of all assessment items:

Draft
: Initial creation of items by trained writers.

Peer Review
: Items reviewed by fellow item writers for initial clarity.

Expert Review
: Subject matter experts review items for content accuracy, SLO alignment, and cognitive level.

Pilot
: A sample administered to a small group to gather preliminary data.

Statistical Analysis
: Data analyzed using psychometric techniques to evaluate item difficulty, discrimination, and distractors.

Approval
: Final items approved for inclusion in the secure Item Bank.

Slide 28: Chapter 4 – Assessment Development Process: 3. Moderation Rules & 4. Illustrative Items

Moderation Rules
: To maintain consistency and quality across tests:

Item Difficulty ( $0.3\le p \le 0.7$ )
: The proportion of students who answer an item correctly should ideally fall within this range.

Discrimination (D>0.2)
: A measure of how well an item differentiates between high-scoring and low-scoring students.

Illustrative Bad vs. Improved Items
: The framework provides clear examples of poorly constructed items contrasted with improved versions that emphasize context-rich scenarios and require higher-order questioning (e.g., analysis, evaluation, problem-solving).

Slide 29: Chapter 5 – Conduct of Examination: Professional Conduct and Environment

Effective execution of examinations is paramount to ensure fairness and integrity:

Professional Training (10 hr annually)
: All supervisory staff (invigilators, superintendents) undergo mandatory annual training covering exam protocols, ethical guidelines, and crisis management. A detailed booklet of duties is provided.

Conducive Environment
: Examination halls must be ergonomic, well-lit, quiet, and free from any external interference or distractions to ensure optimal student performance.

Slide 30: Chapter 5 – Conduct of Examination: Transparency and Cheating Counter-measures

Transparency
: Measures to ensure fairness and prevent malpractice include:

Random appointment of supervisory staff.

Installation of CCTV cameras and e-surveillance systems.

Secure logistics via tamper-proof sealed packets for exam papers and trusted couriers.

Cheating Counter-Measures
: A robust system to combat academic dishonesty includes:

Biometric identification for candidates.

Signal jammers to prevent electronic communication.

AI proctoring solutions for real-time monitoring.

Statistical flagging of suspicious answer patterns.

Strict zero-tolerance policy against any form of cheating.

Slide 31: Chapter 6 – Coding, Marking & Results Compilation: Dual-Coding and Multi-Layer Marking

This chapter details the secure and efficient processing of examination scripts:

Dual-Coding
: To ensure anonymity and impartiality during marking, original roll-numbers are replaced with fictitious, unique numbers. Scripts are often processed in bundles of 250 for logistical efficiency and security.

Multi-Layer Marking
: A phased marking process is implemented to ensure accuracy and consistency:

Head Examiner
: Oversees a team of markers.

Checker
: Re-marks a sample of scripts to ensure accuracy.

Subject Coordinator
: Provides overall quality assurance for a specific subject.

Super-Checker
: Performs a final review of marked scripts.

Slide 32: Chapter 6 – Coding, Marking & Results Compilation: On-Screen Marking and Final Compilation

On-Screen Marking (OSM) Benefits
: Digital marking offers significant advantages:

Auto-Totalling
: Automatic calculation of marks, eliminating arithmetic errors.

Remote Access
: Markers can evaluate scripts from various locations.

Parallel Moderation
: Multiple markers can simultaneously mark the same script or portions.

Increased Reliability
: Digital tools reduce human error and facilitate standardized application of rubrics.

Final Compilation
: A sophisticated software system is used to merge marks from all subjects, incorporate decisions from Unfair Means (UFM) cases, and process absentee data to generate comprehensive and accurate results.

Slide 33: Chapter 7 – Post-Exam Analysis: Purposes and Data Collection

7.1 Purposes
: Post-exam analysis serves multiple critical purposes:

Improve Learning
: Identifies areas where students struggled, informing targeted interventions.

Inform Instruction
: Provides teachers with data on the effectiveness of their teaching strategies.

Enhance Assessment Quality
: Insights into item performance (difficulty, discrimination) lead to refinement of future items.

Communicate to Stakeholders
: Provides transparent data to students, parents, schools, and policymakers.

7.2 Data Collection
: Robust analysis relies on comprehensive data collection, including:

Raw scores, item responses, demographic/context variables.

Rigorous integrity checks are performed on all collected data to ensure accuracy and reliability.

Slide 34: Chapter 7 – Post-Exam Analysis: 7.3 Item Analysis Theories

7.3 Item Analysis Theories
: Two primary psychometric theories are employed for detailed item analysis:

Classical Test Theory (CTT)
: A traditional approach that focuses on observed scores and considers test error. It provides item-level statistics such as difficulty (p-value) and discrimination (D-index).

Item Response Theory (IRT)
: A more advanced, modern approach that models the relationship between a person's ability and their probability of answering an item correctly. It provides item parameters that are independent of the specific sample of test-takers.

Parameters
: Key parameters derived from item analysis include:

Difficulty
: How easy or hard an item is for the test-takers.

Discrimination
: How well an item differentiates between high-ability and low-ability test-takers.

Guessing
: In IRT, an additional parameter that estimates the probability of low-ability test-takers guessing the correct answer.

Slide 35: Chapter 7 – Post-Exam Analysis: 7.4 Exam-Results Dashboard

7.4 Exam-Results Dashboard
: A central, interactive exam-results dashboard is mandated, featuring:

Interactive charts
: Visual representations of performance trends (e.g., subject-wise average scores over time).

Subgroup filters
: Ability to break down data by various demographics (e.g., gender, district, school type) to identify performance gaps.

Heat-maps
: Visualizations showing areas of high and low performance, helping to pinpoint struggling schools or regions.

This dashboard supports the generation of data-driven policy briefs, provides direct school feedback for improvement, and facilitates detailed student-parent reports.

Slide 36: Chapter 7 – Post-Exam Analysis: 7.5 Statistics Toolkit (Descriptive)

7.5 Statistics Toolkit
: A comprehensive statistical toolkit is essential for in-depth analysis:

Descriptive statistics
: Summarize basic features of the data:

Mean
: Average score.

Median
: Middle score when data is ordered.

Mode
: Most frequently occurring score.

Standard Deviation ( $\sigma_{\text{sigma}}$ )
: A measure of the dispersion or spread of scores around the mean.

Slide 37: Chapter 7 – Post-Exam Analysis: 7.5 Statistics Toolkit (Inferential)

Inferential statistics
: Allow for drawing conclusions about a larger population based on sample data:

t-test
: Compares the means of two groups.

ANOVA (Analysis of Variance)
: Compares the means of three or more groups.

Correlation
: Measures the strength and direction of a linear relationship between two variables.

Regression ( $y=\boldsymbol{\beta}0+\boldsymbol{\beta}1x$ )
: Models the relationship between a dependent variable (y) and one or more independent variables (x), allowing for prediction.

Logistic models
: Used when the dependent variable is binary (e.g., pass/fail).

Slide 38: Chapter 7 – Post-Exam Analysis: 7.5 Statistics Toolkit (Predictive)

Predictive modelling
: Advanced techniques to forecast academic performance or identify at-risk students who may need early intervention.

Slide 39: Chapter 7 – Post-Exam Analysis: 7.8 Continuous Improvement (Part 1)

7.8 Continuous Improvement
: The insights gained from post-exam analysis are crucial for a cycle of continuous improvement, leading to:

Curriculum adjustment
: Modifying curriculum content or emphasis based on areas of consistent student difficulty.

Targeted professional development (PD)
: Designing training programs for teachers to address specific instructional weaknesses or new assessment demands.

Slide 40: Chapter 7 – Post-Exam Analysis: 7.8 Continuous Improvement (Part 2)

Refined item banks
: Improving the quality and variety of questions in the item banks based on statistical performance.

Evidence-driven resource allocation
: Directing resources (e.g., funding, teaching materials) to schools or regions most in need based on data.

Slide 41: Chapter 8 – Key IBCC Policy Decisions: 01 Passing Mark

01 Passing mark raised to $40\%$
: A significant increase from the previous $33\%$ to ensure a higher baseline of competency and discourage rote learning.

Slide 42: Chapter 8 – Key IBCC Policy Decisions: 02 Grace Marks

02 Grace marks limited
: Grace marks are strictly limited to second-attempt examinations, with a maximum of $\le 7$ marks allowed in only one subject, to prevent undue inflating of results in the first attempt.

Slide 43: Chapter 8 – Key IBCC Policy Decisions: 03 SLO-aligned Syllabi & Teacher Training

03 SLO-aligned syllabi & teacher training mandated
: All syllabi must be revamped to clearly define Student Learning Outcomes.

Extensive teacher training programs are compulsory to ensure alignment between instruction and assessment.

Slide 44: Chapter 8 – Key IBCC Policy Decisions: 04 Textbooks Revised

04 Textbooks revised yearly to defeat guidebooks
: Textbooks will undergo annual revisions to minimize the effectiveness of guidebooks and promote original learning.

Slide 45: Chapter 8 – Key IBCC Policy Decisions: 05 Uniform Exam Standards

05 Uniform exam standards (scheme, cognitive weights)
: Ensures consistency across all boards in exam design, weighting of content areas, and cognitive levels.

Slide 46: Chapter 8 – Key IBCC Policy Decisions: 06 Shift to Higher-Order Thinking

06 Shift to higher-order
: Examination papers will now have a revised cognitive weighting:

Knowledge 30 %, Understanding 50 %, Application 20 %.

This promotes critical thinking and problem-solving over mere recall.

Slide 47: Chapter 8 – Key IBCC Policy Decisions: 07 Capacity Building

07 Capacity building for item writers, markers
: Ongoing training and development programs for personnel involved in creating questions and evaluating scripts to enhance their skills and ensure quality.

Slide 48: Chapter 8 – Key IBCC Policy Decisions: 08 Central & Provincial Item Banks

08 Central & provincial Item Banks
: Establishment of organised repositories of questions at both central and provincial levels.

$\ge 20\%$ of every paper drawn from IBCC central bank to ensure quality, consistency, and standardisation across examinations.

Slide 49: Chapter 8 – Key IBCC Policy Decisions: 09 Theory & Practical Split

09 Theory & practical split; separate passing
: Academic theory components and practical components will be assessed and passed separately for comprehensive evaluation of student competency in both theoretical knowledge and practical application.

Slide 50: Chapter 8 – Key IBCC Policy Decisions: 10 Practical Assessment Details

10 Practical includes MCQ-written + lab assessment by subject teacher
: Practical examinations will comprise both a written Multiple Choice Question (MCQ) component and a direct lab assessment conducted by the subject teacher.

Slide 51: Chapter 8 – Key IBCC Policy Decisions: 11 Research & Development Units

11 Research & Development units at each Board; compulsory annual studies
: Each Examination Board must establish dedicated R&D units and conduct compulsory annual studies to analyse assessment data, identify trends, and implement evidence-based improvements to their practices.

Slide 52: Chapter 8 – Key IBCC Policy Decisions: 12 New GPA/CGPA-based Grading

12 New GPA/CGPA-based grading (see below)
: Implementation of a new comprehensive grading system based on Grade Point Average (GPA) and Cumulative Grade Point Average (CGPA) for better representation of student performance.

Slide 53: Chapter 9 – Guidelines & SOPs: 9.1 Item Writers

9.1 Item Writers
: Guidelines for those developing assessment questions:

Alignment
: Ensuring every item directly assesses a specific Student Learning Outcome (SLO).

Bloom Balance
: Crafting questions that cover a range of cognitive levels according to Bloom's Taxonomy (e.g., remembering, understanding, applying, analyzing, evaluating, creating).

Context Relevance
: Designing items that are contextualised, engaging, and relevant to students' experiences to enhance comprehension and application.

Pilot Testing
: Participating in the pilot testing phase to gather preliminary data on item performance and student understanding, allowing for refinement before final use.

Slide 54: Chapter 9 – Guidelines & SOPs: 9.2 Reviewers

9.2 Reviewers
: Standards for evaluating assessment items:

Check SLO alignment
: Verifying that each item accurately measures the intended Student Learning Outcome.

Bias detection
: Identifying and eliminating any potential biases (e.g., cultural, gender, socioeconomic) that might disadvantage certain groups of students.

Clarity assessment
: Ensuring that questions are clear, unambiguous, and free from misleading language or confusing phrasing.

Cognitive distribution
: Confirming that the set of items collectively covers the required range of cognitive levels as per the Table of Specification.

Approve/reject
: Making informed decisions to approve or reject items based on their adherence to quality standards and guidelines.

Slide 55: Chapter 9 – Guidelines & SOPs: 9.3 Moderators

9.3 Moderators
: Roles in ensuring overall test quality:

Ensure test-wide balance
: Verifying that the assessment maintains an appropriate balance across content areas, question types, and cognitive levels as per the blueprint.

Fairness
: Reviewing the test as a whole to ensure it is equitable and provides all students with a fair opportunity to demonstrate their knowledge.

Timing
: Confirming that the test can be reasonably completed within the allocated time frame, preventing undue time pressure.

Security
: Overseeing the secure handling and storage of test materials before, during, and after the examination to prevent leaks or tampering.

Produce moderation report
: Documenting the moderation process, findings, and recommendations for future assessment development cycles.

Slide 56: Chapter 9 – Guidelines & SOPs: 9.4 Test Assembly

9.4 Test Assembly
: Procedures for constructing the final examination paper:

Logical sequencing
: Arranging questions in a logical order (e.g., easier to harder, or following curriculum flow) to facilitate student progress.

Uniform formatting
: Ensuring consistent formatting, font, spacing, and layout throughout the paper for readability and professionalism.

Time flags
: Including clear instructions regarding time limits for specific sections or question types to guide students and invigilators.

Final approval
: Obtaining final sign-off from relevant authorities before the test is printed or administered, confirming all checks have been completed.

Slide 57: Chapter 9 – Guidelines & SOPs: 9.5 Test Administration

9.5 Test Administration
: Protocols for conducting the examination:

Material control
: Strict management of all examination materials (question papers, answer scripts, attendance sheets) to prevent loss or compromise.

Trained invigilators
: Ensuring all invigilators are professionally trained in examination rules, ethical conduct, and emergency procedures.

Accessibility
: Making necessary accommodations to ensure the test environment is accessible to students with special needs, adhering to equity principles.

Emergency protocols
: Establishing clear procedures for handling unforeseen circumstances during the exam, such as power outages, medical emergencies, or security incidents.

Slide 58: Chapter 9 – Guidelines & SOPs: 9.6 Marking/Coding

9.6 Marking/Coding
: Standards for evaluating and processing answer scripts:

Rubric adherence
: Strict adherence to predefined marking rubrics for consistent and objective scoring across all examiners.

Double marking
: Implementing a system where at least two examiners independently mark a subset of scripts to ensure inter-rater reliability and identify discrepancies.

Recording accuracy
: Ensuring all marks are accurately recorded and tallied, minimising clerical errors through digital or verified manual processes.

Appeal mechanism
: Establishing a transparent and fair process for students to appeal their results or marking decisions, ensuring due process and accountability.

Slide 59: Grading System & GPA Formulae

New Five-Point Scale (A++ to U)
:

A++
$\ge95 = \text{GPA } 5.0$

C
$60–69 = \text{GPA } 3.0$

E
$40–49 = \text{GPA } 1.0$

U
<40 = \text{GPA } 0

Example (Grade 11)
:

$\text{GPA}=\frac{\sum GP}{\text{courses}}=\frac{20.5}{7}=2.92$

CGPA Example
:

Grade 12 GPA $4.92 \rightarrow \text{CGPA}=\frac{2.92+4.92}{2}=3.92$

Slide 60: Digitisation Standards Recap

Pre-exam
: Online curriculum-SLO mapping, digital ToS, secure item bank.

During-exam
: E-paper delivery, CCTV, e-attendance, biometric identification.

Post-exam
: On-Screen Marking (OSM), automated stats, e-certificates, public dashboards.

Slide 61: Capacity Building & Training

Annual 10-hour programmes
: Mandatory professional development for all staff involved in assessment processes, ensuring up-to-date skills and knowledge.

Phase-wise training matrices
: Tailored training for pre-exam, during-exam, and post-exam phases, addressing specific needs at each stage.

IBCC to fund AI-enhanced QIB (Question Item Bank) development
: Investment in advanced technology to improve the quality, security, and efficiency of item bank generation.

Slide 62: Resources

IBCC portal
: Serves as a central online repository for educational resources.

Hosts subject-wise frameworks, Tables of Specification (ToS), SLO lists, model papers, and formative-summative guidelines: https://ibcc.edu.pk/MAF/

Slide 63: Figures & Tables Reference Guide

Fig-1
: Illustrates formative vs. summative assessment concepts.

Fig-4
: Depicts the Bloom pyramid (cognitive levels of learning).

Tables 1–14
: Cover specific details such as assessment blueprint, validity standards, GPA grid, and more, providing specific guidelines and examples for practical implementation.

Slide 64: Lead Contributors (selected)

Dr. Ghulam Ali Mallah
: Executive Director IBCC, champion of digitalisation initiatives and driving force behind MAF's technological integration.

Dr. Sajid Ali Yousuf Zai
: Expert in psychometrics and instrumental in the development of assessment dashboards and data analysis methodologies.

Dr.