Notes on Vocabulary Testing (8.1–8.2)

8.1 Introduction
  • Goal of testing vocabulary: assess subjects’ knowledge of lexical items.

  • Key tasks in preparing vocabulary tests:

    • Achievement testing: select vocabulary items from course materials.

    • Language proficiency testing: decide whether to test passive vs active vocabulary and whether items should conform to spoken or written language.

  • Level-based guidance for word selection:

    • Elementary level: include basic words (e.g., names of things found in classrooms and local community).

    • Intermediate level: include words essential for speaking and oral communication.

    • Advanced level: select words from the lexicon of the written language.

  • Coverage of word types:

    • Only content words (nouns, verbs, adjectives, adverbs) are tested in vocabulary tests.

    • Function words (articles, determiners, prepositions, conjunctions, pronouns, auxiliary verbs) are tested in structure tests (to be discussed in the next chapter).

  • Passive vs active vocabulary distinction:

    • Passive vocabulary: words subjects recognize in a stimulus but may not use in speaking/writing.

    • Active vocabulary: words subjects have full command of and use frequently in speech and writing.

  • After decisions on the above considerations, the test designer must consider:

    • Frequency, scope, and usability of the words to be included.

    • Consulting frequency lists (e.g., the General Service List of English Words, West, 1953) can provide useful insights.

  • Limitations of word lists (to justify designer judgment):

    • Often out-dated.

    • Based mainly on written language.

    • Do not reflect the difficulty level for the examinees’ native language.

  • Designer’s response to limitations:

    • Emphasize the test objectives and rely on the designer’s (or experienced colleagues’) judgments.

  • The second task for the test constructor: determine the form of the items.

    • Ideally, the tested word is underlined in a context and accompanied by four paraphrase meanings.

    • Paraphrase item structure: the four choices are usually paraphrases of the tested word in simple terms.

    • Example 1 (paraphrase-type item):

    • After discussing the matter for two hours, the committee adjourned without having reached any decision.

    • a. finished b. continued c. took off d. broke off

    • This tests understanding of the word in context.

    • Disadvantages of Example 1-type items:

    • Limits testing to one word per item.

    • Sometimes lexical items do not lend themselves to four sensible paraphrases.

    • May allow testees to ignore the context and deduce the meaning from the options alone.

    • Variation to address drawbacks: deletion (cloze) type items.

    • Deletion (Example 2): the tested word is deleted from context; four choices complete the sentence.

    • Example 2 item:

    • After discussing the matter for two hours, the committee ……… without having reached any decision.

    • a. debated c. collapsed b. grouped d. adjourned

    • Advantage: economical, tests four lexical items at once.

    • Other widely used format: standard vocabulary-type (definition with four choices).

    • Example 3 (definition-type):

    • "Adjourn" means………

    • a. cause to turn aside c. come into use again

    • b. break into pieces d. break off for a time

    • Example 4 (definition-type continued):

    • "Break off for a time" means………

    • a. collapse c. deflect

    • b. adjourn d. prostrate

    • Critique of standard-definition format: economical but has backwash effects; encourages memorization of word lists rather than meaningful understanding.

    • A better item type: a passage of connected discourse with content words deleted; testees fill the blanks with suitable words.

    • Example 5 (cloze-type passage):

    • You will read a passage in which there are blanks. Read the entire passage first. Then, fill in each blank with an appropriate word.

    • Example narrative (excerpt):- Good drivers always obey the policeman. They do what he tells them. But sometimes a bad driver ……… the policeman. Then he takes out his notebook. He writes down the number of the ………… Later, a judge may make the bad driver pay some money.

8.2 Guidelines for Constructing Vocabulary Items

1) Context clarity

  • The context should be clear enough to provide the testee with a clear meaning.

  • A short conversation-like context is preferred to a brief statement, especially for less proficient subjects.

  • Poor example for "borrow":- Behnam borrowed John’s book. (too little context for testing a verb sense)

  • Better example for "borrow":- Did Behnam buy another book? No, he………John’s.

    • Options: a. borrowed c. lent b. sold d. returned

      2) Avoid adding unnecessary grammatical difficulty

  • Do not include grammatical structures or sources of difficulty beyond the lexical item.

  • Poor contextualization example (bereaved of his belongings) that makes comprehension cumbersome.

  • Better contextualization should avoid requiring extra knowledge beyond the word meaning.

    3) Paraphrase-type item quality (Example 1)

  • If the item is paraphrase-type, choices should be easier than the tested word and share the same grammatical form as the underlined word.

  • Examples of defective items and analysis:- Example defect: The child was frightened of being left alone in the dark room.

    • Options: a. annoyed c. terrified b. ashamed d. dismayed

    • Although four choices exist, some have frequencies or nuances that may mislead; the third choice may be less frequent than the tested word, making interpretation uncertain.

    • Other defective item: only one option makes sense in context; answer relies on knowledge beyond the word’s meaning (structure bias).

    • A further defective item shows only one syntactically substitutable option; does not measure knowledge of the tested word.

    • Improvement: ensure multiple distractors test the word meaning, not only syntax.

      4) Completion-type (Example 2) item quality

  • If the item is completion-type, distractors and the tested word should be of the same level of difficulty and syntactically acceptable in the stem.

  • Poor examples illustrate distractors that are too easy or syntactically inappropriate, allowing elimination for reasons unrelated to knowledge of the tested word.

  • Additional poor example shows distractors that are not syntactically appropriate, or where the first two options can be easily eliminated.

    5) Topic relatedness and distractor quality

  • Choices should be related to the same general topic or area to avoid outside knowledge biases.

  • Example: words related to clothing or dresses should be used to avoid unrelated options.

  • Demonstration item: using a word (e.g., "hideous") in different contexts to test control over lexical items.

  • Poor and good contrasts illustrate how context controls the difficulty and lexical focus.

    6) Length balance of options

  • Choices should be approximately the same length; no option should be markedly shorter or longer than others.

  • Example shows a distractor noticeably longer than the rest, which can cue test-takers to the correct answer or to eliminate options unfairly.

  • Recommendation: either ensure equal-length options or pair them by length.

  • Example demonstrates potential bias when a long option stands out; test constructors may intentionally balance lengths to avoid bias.

    7) Additional considerations from practice/examples

  • Test designers often debate whether to qualify the correct option to ensure it is unambiguously right; maintaining balance in form helps prevent this bias.

  • Balanced option length and structure help ensure the test measures lexical knowledge rather than test-taking strategies.

  • Exercises warn about over-reliance on memorization and encourage meaningful, contextual learning.

    8) Activity prompts

  • Activity 1: Consider advantages and/or limitations of each item type discussed.

  • Activity 2: Evaluate proposed item types (e.g., vehicle-related questions, multi-word categories, color and dress-related prompts) and judge their appropriateness for proficiency testing.

  • Activity 3: Frequency ranking considerations using sources like the Teacher’s Word Book (historical reference) to decide inclusion of rare or specialized vocabulary.

  • Activity 4: Identify defects in provided items (A–G) and suggest modifications to improve validity and reliability.

  • Activity 5: From a provided list (official, achieve, collect, drowsy, include, possess, efficient, contain), write one paraphrase-type item and one completion-type item.

    9) Practical and ethical implications

  • Testing should support learning, not just assessment; backwash effects should be minimized to prevent rote memorization.

  • Item design should promote understanding of word meaning in realistic contexts and help learners build usable vocabulary.

  • Consideration of learners’ native language and cognitive load is essential to avoid unnecessary difficulty or cultural bias.

  • Use of frequency data should be complemented by expert judgment and alignment with course objectives and real-world usage.

    10) Examples and references in practice

  • Illustrative items (paraphrase, deletion, and cloze) demonstrate how context, syntax, and distractor quality influence the validity of a vocabulary test.

  • The document also references frequency resources like West’s General Service List (1953) and Thorndike’s Teacher’s Word Book (historical-frequency ranking) to inform item selection.


Quick reference: item types and their pros/cons
  • Paraphrase-type (Example 1): tests understanding of a word in context; pros: checks immediate meaning; cons: tests one word per item, sometimes paraphrases are hard to generate, context may be ignored by test-takers.

  • Deletion (Example 2): tests ability to infer word from context; pros: economical, tests multiple lexemes at once; cons: need careful distractors; potential for context-cueing.

  • Definition-type (Example 3/4): tests knowledge of a word from a definition; pros: economical; cons: backwash risk, encourages memorization; not ideal for deep understanding.

  • Cloze/passages (Example 5): tests ability to select appropriate word in a connected text; pros: context-rich; cons: more complex to create well-balanced items.


Notable examples (from the transcript)
  • Example 1 (paraphrase):

    • After discussing the matter for two hours, the committee

    • ……… without having reached any decision.

    • Options: a. finished b. continued c. took off d. broke off

  • Example 2 (deletion):

    • After discussing the matter for two hours, the committee ……… without having reached any decision.

    • Options: a. debated b. grouped c. collapsed d. adjourned

  • Example 3 (definition):

    • "Adjourn" means……….

    • Options: a. cause to turn aside b. break into pieces c. come into use again d. break off for a time

  • Example 4 (definition continuation):

    • "Break off for a time" means………..

    • Options: a. collapse b. adjourn c. deflect d. prostrate

  • Example 5 (cloze passage):

    • A reading passage with blanks to fill; instructions emphasize reading the entire passage first.

    • Sample prompt about a driver and a policeman, with blanks for contextually appropriate words.


Summary of key principles for vocabulary-item design
  • Purpose & Focus: Define test purpose (achievement vs. proficiency), and choose between passive/active and spoken/written vocabulary.

  • Word Selection: Use targeted word lists aligned with proficiency levels. Prioritize content words; test function words separately.

  • Item Formats: Select formats that assess meaningful word knowledge and promote learning over rote memorization.

  • Context & Difficulty: Ensure clear context, avoiding extraneous grammatical difficulty beyond the lexical item.

  • Distractor Quality:

    • Paraphrase: Choices must be syntactically parallel, easier than the tested word.

    • Completion: Distractors and the tested word should be of similar difficulty, syntactically acceptable, and contextually plausible.

    • General: Maintain topic coherence among choices; balance option length to prevent cues.

  • Ethical Design: Minimize backwash effects to encourage genuine vocabulary learning. Consider learners’ native language and cognitive load.

  • Expert Judgment: Complement frequency data with expert judgment and align with course objectives.

Summary of Vocabulary Testing Principles

Vocabulary testing assesses lexical knowledge, involving key decisions based on test purpose (achievement vs. proficiency), vocabulary type (passive vs. active), and target language form (spoken vs. written). Word selection varies by proficiency level (elementary, intermediate, advanced) and focuses on content words, while function words are tested separately.

Test designers must consider word frequency and usability, often consulting frequency lists but ultimately relying on expert judgment due to list limitations (out-dated, written-based, not culturally sensitive).

Various item formats exist, each with pros and cons:

  • Paraphrase-type: Tests understanding in context; can test only one word, and distractors can be difficult to create.

  • Deletion (Cloze) type: Tests contextual inference; economical for multiple items but requires careful distractor design.

  • Definition-type: Tests direct knowledge of definitions; economical but can encourage rote memorization.

  • Cloze Passages: Tests word selection in connected text; provides rich context but is more complex to design.

Guidelines for Constructing Effective Items:

  1. Context Clarity: Ensure the context provides clear meaning without unnecessary grammatical difficulty.

  2. Distractor Quality: For paraphrase items, choices should be easier and syntactically parallel. For completion items, distractors and the answer should be of similar difficulty and syntactically acceptable. All choices should be topic-related and balanced in length to avoid cues.

  3. Ethical Design: Minimize negative "backwash" by promoting meaningful learning over memorization. Consider learners' native language and cognitive load.

  4. Expert Judgment: Supplement frequency data with expert insight to align items with test objectives.