English historical linguistics and the use of corpora

0.0(0)
studied byStudied by 2 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/27

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

28 Terms

1
New cards

Old English Morphology

Synthetic, inflectional, and structurally rich. Features:

  • Multiple cases

  • Agreement marking

  • Complex verb classes

  • Prefixes and suffixes

  • Grammatical gender

  • Numerous declensional patterns
    Much grammatical info expressed via endings rather than word order.

2
New cards

Old English Syntax

  • Freer but structured word order (supports rich morphology)

  • Verb-second tendency in main clauses

  • Multiple negation

  • Pre-verbal negation

  • Flexible positioning of adjectives, possessors, pronouns

  • Frequent subordination with þe and þæt

3
New cards

Old English Semantics

Words had broader meanings than Present-Day English (PDE) equivalents. Examples:

  • yfel: harm, misfortune, wickedness

  • costnung: testing, trial, temptation

  • hlāf: bread, food, sustenance

  • heofonum: sky + divine realm

  • forgyfan: give fully, release → later “forgive”

  • willa: intention, plan → stronger than PDE “will”

  • gylt: crime, guilt, sin (legal + moral + spiritual)

4
New cards

Old English Sounds & Spelling

Transparent, phonologically faithful system, influenced by dialect & scribal habits.

  • Palatalization before front vowels (/k/, /g/)

  • Vowel length phonemic (god /ɡod/ vs gōd /ɡoːd/)

  • Special letters: thorn <þ>, eth <ð>, ash <æ>

5
New cards

Corpus Linguistics – Key Concepts

Study of language through structured collections of texts (corpora). Features:

  • Electronic storage

  • Authentic texts

  • Types: monolingual, multilingual, learner, pedagogic, historic

  • Annotation adds linguistic info

6
New cards

Corpus Types

  • Monolingual: general/reference, specialized, learner, pedagogic, historic/diachronic

  • Multilingual: parallel, comparable

  • Examples: BNC, ANC, COCA, HC, PPC(E)ME, ICLE, GloWbE

7
New cards

Representativeness

Extent to which corpus findings generalize to a language variety. Determined by:

  • Range of genres included

  • Selection of text chunks
    Sample must capture variability of the target language.

8
New cards

Corpus Annotation

Adds metalinguistic info to texts; types include:

  • POS tagging (word class)

  • Lemmatization (group inflected forms)

  • Syntactic annotation (parsing/treebanking)

  • Semantic annotation (word meaning)

  • Pragmatic/discourse annotation

  • Phonetic/prosodic annotation

  • Error tagging (learner corpora)

9
New cards

Why Use Corpora

  • Access quantitative data to support qualitative analysis

  • Generalizable insights beyond small samples

  • Understand real language usage, not just intuition

10
New cards

Corpus Applications

  • Historical linguistics: trace grammatical changes

  • Syntax: innovations in spoken language

  • Semantics: collocation & phraseology

  • Dictionary writing: evidence-based (e.g., COBUILD)

  • Sociolinguistics, discourse analysis, EAP, stylistics

11
New cards

Corpus Analysis Techniques

  • Frequency analysis: how often words appear

  • Keyword analysis: statistically significant words

  • Concordance analysis: word meaning & grammar in context

  • Collocation analysis: typical word co-occurrences

12
New cards

Examples of Online Corpora

  • News on the Web (NOW): 23.5B words, web news 2010–present

  • iWeb: 14B words, 6 countries, 2017

  • GloWbE: 1.9B words, 20 countries, 2012–2013

  • Wikipedia Corpus: 1.9B words, 2014

  • Coronavirus Corpus: 1.5B words, 2020–2023

13
New cards

COCA (Corpus of Contemporary American English)

Features:

  • Frequency info for top 60,000 words

  • Collocates, clusters, KWIC (Keyword in Context)

  • Used for lexical analysis, semantic investigation, corpus-driven research

14
New cards

Corpus Linguistics

Study of actual language in use through corpora. Focuses on patterns of words in sequences rather than single words (Sinclair, 1990).

15
New cards

Core Research Questions in Corpus Linguistics

  • What patterns are associated with lexical or grammatical features?

  • How do these patterns differ across varieties and registers?

16
New cards

Corpus Linguistics – Methodology

Uses computer software to study large quantities of language data. Explains the relationship between meaning and structure (Tognini-Bonelli 2001). Applicable across almost all linguistic research areas.

17
New cards

Corpus Linguistics – Limits

Cannot provide:

  • Negative evidence (what is not possible in language)

  • Explanations for WHY patterns exist

  • A complete record of all possible language at a time

18
New cards

History of Corpus Linguistics – Early 20th Century

Paper-based corpora (“shoeboxes” of slips). Key figures: Jespersen (1909–49), Fries (1952).

19
New cards

Chomsky (1957) & Corpora

Invented examples in Syntactic Structures led to criticism of corpora; corpus-based study was temporarily abandoned.

20
New cards

First “Modern” Corpora – 1960s

Technological development enabled electronic corpora. Key examples:

  • Brown Corpus (1961): 1 million words, written American English

  • LOB Corpus (1970–78): written British English

21
New cards

Brown Family Corpora

Extensions for comparative purposes:

  • FROWN (1991, American)

  • FLOB (1991, British)

  • BE06 (21st-century British English)

22
New cards

COBUILD & John Sinclair

Corpus analysis to study meaning through collocation. Projects:

  • Bank of English corpus → COBUILD dictionaries (1987, 1995)

  • COBUILD English Grammar (1990)

23
New cards

Corpus Linguistics – Future Directions

Integration with Digital Humanities:

  • Headtalk project (Nottingham): video corpus + gestures

  • Lancaster University: GIS + corpus → semantic mapping of place names

24
New cards

Corpus-Based vs. Corpus-Driven Linguistics

  • Corpus-based: corpus used to test/refine a theory or hypothesis

  • Corpus-driven: corpus itself generates hypotheses; embodies a theory of language

25
New cards

Corpus Tools

  • AntConc (downloadable software)

  • English Corpora (online access, billions of words)

  • MICASE 50 / Corpus Class Test

  • XML Helsinki Corpus Browser

26
New cards

Corpus Use in Practice – Spelling Example

Word evolution: OE Fæder → ME Fader → Modern English Father.

27
New cards

Corpus Use in Practice – Old English Examples

Texts show semantic, syntactic, and morphological use of Fæder:

  • Aethelwold, Benedictine Rule: Fæderlice, heofonlican Fæder

  • Aelred of Rielvaux, De Institutione Inclusarum: fader, holy faders

  • New Testament: my Father, the Father loueth the Sonne

28
New cards

Corpus Analysis Considerations

Possible aspects to examine:

  • Spelling forms

  • Syntactic environment (OV, SV, V2?)

  • Semantic nuances (literal, metaphorical, kinship, religious)