DONE ✅Discourse across semiotic modes — Page-by-page notes (Bateman, 2009)

Page 1

Bateman (2009) defines three distinct semiotic modes for page-based and time-based artefacts, arguing that while they share common substrates (e.g., 2D pages or screens), they must be carefully distinguished.
The three semiotic modes are:
- $\text{text-flow}$ : supports a linear unfolding of logical text organization and includes motivation for basic text-formatting options.
- $\text{page-flow}$ : exploits the two-dimensional space of the page to express rhetorical relationships via spatial proximity and grouping.
- $\text{image-flow}$ : uses the space of the page or presentation in time to carry an unfolding conjunctively-related discourse.
Core idea: multimodal artefacts combine multiple modes, and analysis must respect the distinct contributions and interactions of each mode rather than treating everything as if it were language.
Multimodal linguistics faces analytic dangers: linguistic imperialism can lead to imposing language-based analyses on non-linguistic modes; artefacts are not simple or purely linguistic, and superficial interpretations can cloak richer interpretations available from other semiotic resources.
Halliday’s epistemological anchor (grammar as foundation):
- $\text{Halliday (1985)}: \text{“a discourse analysis that is not based on grammar is not an analysis at all, but simply a running commentary on a text.”}$
The promise and risk of multimodal analysis: richer, more colorful analyses are possible, but researchers must develop robust empirical frameworks rather than rely on informal or armchair reasoning.
The move from single-modality (text-only) analysis to multimodal analysis raises questions about whether there is any natural-language-like syntax in multimodal artefacts, and if not, how to model their discourse structure on a spatial/visual basis.
One fundamental distinction: the analysis cannot simply assume that the same kinds of meaning found in language will automatically appear in spatial/visual modalities; hypotheses must be testable and empirically grounded.
Two broad scientific issues emerge:
- How do spatial/visual/temporal modalities contribute to meaning-making, and what are their limits?
- How can we test hypotheses about multimodal discourse structure when no universally agreed-upon, language-like syntax exists for multimodality?
The central scientific challenge is to move beyond informal discourses of multimodal analysis toward frameworks that place analysis on sound empirical foundations and allow testing against empirically grounded data.

Page 2

The research challenge (Section 2):
- Our understanding of the meaning-making potential of visually-, spatially-, and temporally-based artefacts is weaker than we think; sophisticated theories often outpace what artefacts actually reveal.
- A well-cited claim by Halliday is reiterated: $\text{“a discourse analysis that is not based on grammar is not an analysis at all, but simply a running commentary on a text.”}$ (Halliday, 1985: xvii).
- There is concern about how to adapt linguistic tools to multimodal artefacts without losing empirical grounding.
Examples (Section 3):
- Even simple multimodal pages can miscommunicate rhetorical organization due to the page’s spatial layout unintentionally conveying relationships that are false or distorted.
- A need exists to separate visual structure from likely interpretations of that structure so that their interrelationships can be contrasted and compared.
- A common but problematic characterization (e.g., from Kress & Van Leeuwen, 1996) describes page layouts along left-right (given - new) and top-bottom (ideal - real) dimensions. Questions arise:
- Where exactly on the vertical axis should a division into ideal vs. real be positioned (e.g., is the ideal the heading above the horizontal division, or do other elements belong to ideal/real)?
- Are these axes universally meaningful, and what would concrete consequences follow from choosing one interpretation over another?
- In ideologically charged texts, the ideal/real distinction can seed discussion, but the foundations of such discussion remain contested and under-evaluated.
- These problems multiply for dynamic (time-based) artefacts.
Research method (Section 4):
- Proposes adopting corpus-based linguistic investigative methods for multimodal artefacts: use large, principled corpora to test hypotheses against actual data rather than relying on intuition.
- Key technique: annotation layers in corpora, enabling queries at higher abstraction levels and allowing empirical examination of patterns.
- Multimodal corpora are advancing for time-based artefacts (e.g., spoken language with video, gesture, proxemics). For time-based data, annotations often include intonation, syntax, distance, gesture, and interaction context (Norris 2002, 2004).
- For multimodal documents (visual/spatial, not time-based), annotation criteria are still emerging. Useful layers include:
- Layers capturing the document’s form and expressive resources (presentational modalities)
- Layers capturing the document’s functional organization (how structure serves communicative aims)
- An example analytical scheme is the GeM (Genre and Multimodality) project (Delin et al., 2002; Bateman, 2008).
- GeM provides a set of layers of description for multimodal documents (Table 1), argued to be the minimum set needed to do justice to page-based artefacts. This set informs the construction of multimodal document corpora aligned with contemporary corpus design standards.
- Empirical investigation encourages cross-disciplinary dialogue and cautions against theoretical silos.

Page 3

Recent research (Section 5):
- The GeM framework supports multiple independent layer descriptions; researchers can search for co-selection patterns across layers to see if theoretically independent choices are actually constrained in production/consumption, which would indicate underlying generative constraints.
- Two generic sources of such constraints are identified:
- An extended notion of multimodal genre that includes historically situated production/consumption (Kress & Van Leeuwen 2001; Bateman, 2008: ch. 5).
- The existence of distinct semiotic modes that combine particular expressive resources with specific communicative goals.
- The argument emphasizes that multimodal artefacts can be composed of many modes in various configurations, which is both a strength and a source of analytical complexity.
Research proposal (Section 6):
- The GeM-based approach expands the analysis beyond traditional, single-mode accounts by incorporating layered descriptions and empirical testing.
- Important ongoing and future tasks include refining the concept of composite modes (modes that combine several semiotic resources) and identifying how modal composites (e.g., speech with gesture and visual cues) interact.
- Acknowledges that composite modes are common (e.g., spoken discourse combines grammar with intonation, gesture, and facial expression).
- Practical questions: to what extent can composite page-flow or composite multimodal modes be decomposed for analysis, or are they inherently integrated?
- The central research imperative is to conduct detailed empirical analyses on representative samples of multimodal documents, using the GeM annotation layers as a concrete, comparable basis for cross-case comparisons.
Practical relevance (Section 7):
- As technology makes it easy to incorporate visual elements, artefact designers at all levels increasingly deploy multimodal resources, raising the overall communicative load and the need for design that is understandable and effective.
- Poor design has tangible negative consequences: user misunderstandings, self-blame, financial losses due to form design, and negative corporate image (Schriver 1997; Waller & Delin 2003).
- There is a rising emphasis on multimodal literacy in education to make multimodal meaning-making more visible and teachable (Kalantzis & Cope 2000; Kress 2003; Goodman & Graddol 1996; Unsworth 2001, 2007).
- The overall message: clearer, empirically grounded representations and methods for analyzing multimodal contributions are essential for both theory and practice.
Assignment (Illustrative tasks using GeM layers):
- Use two GeM annotation layers (Layout and Rhetorical) to examine the relationship between layout (spatial organization) and rhetorical structure (content organization).
- Compare two figures (left, 1994; right, 1996) and address:
- What layout structures are employed?
- Does publication year influence the interpretation or design choices?
- What semiotic modes are present in the artefacts?

Page 4

Continuation of the GeM framework and the six-layer model (Table 1) for page-based artefacts:
- Layout structure: nature, appearance, and position of communicative elements on the page; hierarchical relationships among them.
- Navigation structure: support for the intended mode(s) of consumption; elements that direct or aid reader use of the document.
- Linguistic structure: linguistic details of verbal elements realizing the layout elements.
- Content structure: propositional content; the “field” of discourse (cf. Martin, 2001).
- Rhetorical structure: the relationships between content elements; how content is argued and organized rhetorically (main material vs. supporting material).
- Genre structure: the stages/phases defined for a given genre; how content delivery proceeds through stages.
- These layers form the minimal yet sufficient set for describing page-based artefacts, enabling the construction of empirical multimodal corpora.
The three semiotic modes re-articulated:
- $\text{text-flow}$ : verbal text on a page; line of developing text provides a basic one-dimensional organization; may include diagrams, tables, footnotes, etc.; not meant to carry deep, independent spatial meaning.
- $\text{page-flow}$ : uses the page’s full two-dimensional space to express communicative purposes; can combine elements from other modes but adds spatially-signalled relations.
- $\text{image-flow}$ : organizes sequences of graphical elements; conveys meanings beyond individual images; relates to temporal sequencing and montage in time-based media.
Composite and co-present nature of modes:
- Modes are often active simultaneously and can be composite (e.g., spoken discourse integrates grammar, gesture, and intonation).
- Page-flow is inherently composite, relying on pre-existing elements (text blocks, diagrams, framing lines) but adding something more to the page.
- Film and time-based media likewise demand composite treatment.
The analytical challenge and the way forward:
- Multimodal artefacts combine multiple modes, which increases complexity in analysis.
- Many current analyses are idiosyncratic with little cross-disciplinary constraint.
- The immediate task is detailed empirical analyses using the GeM framework across representative document samples.
- This approach helps unify and constrain analyses across diverse modalities and genres.
Practical note on research scope:
- The field risks fragmentation if it does not converge on empirical, testable methods and cross-disciplinary dialogue.

Page 5

Research proposal (continued):
- Emphasis on moving beyond isolated artefacts toward larger, representative document corpora across genres, audiences, time periods, and functions (inform, persuade, instruct, etc.).
- Dimensional sampling to include factors such as audience, document type (glossy magazines, web pages, newspapers), historical period, content area, and communicative function.
- The aim is to assess whether different genres rely on different layout choices, enabling generalizable insights rather than case-specific observations.
- In corpus-based investigations, researchers can test claims about layout regions and information status (as per Kress & Van Leeuwen) and about text-graphics relations (Marsh & White 2003; Stöckl 2004; Martinec & Salway 2005; Kong 2006).
- The empirical program involves evaluating how well proposed classifications cover actual data, moving beyond aspirational claims to data-driven generalizations.
Caution against premature over-generalization:
- The multimodal field risks becoming promissory if analysts claim universal applicability without robust empirical support.
- Müller (2007) is cited as noting the persistence of anti-visual bias within the field.
- Fragmentation remains a risk if research communities do not engage with each other.
Path forward: integrated, empirical, cross-modal analysis using GeM layers to unify methods and enable broader validation across communities.

Page 6

Practical relevance (continued):
- The practical significance of detailed, empirical multimodal analysis grows as multimedia becomes pervasive in everyday materials (manuals, websites, advertisements, educational resources).
- Clear design improves readability and reduces misunderstanding, with financial and reputational implications for organizations.
- Multimodal literacy is increasingly recognized as essential for education; explicit teaching of multimodal analysis can make meaning-making processes visible and teachable.
- The GeM framework supports practical critique and redesign of documents by aligning form (layout) with function (rhetorical aims).
Assignment (extended):
- Use the layout and rhetorical layers to analyze how layout choices influence rhetorical effect and meaning-making in two artefacts from different times or contexts.
- Compare how the two artefacts mobilize different semiotic resources and how that affects interpretation.
- Consider how changing technologies influence the coupling of layout and rhetoric, and what this implies for multimodal literacy and design practice.

Page 7

Chapter 4 transition: Schemes and tropes in visual communication – The case of object grouping in advertisements, by Alfons Maes and Joost Schilperoord (Tilburg University).
Focus: the scheme-trope distinction in the visual medium; examines how visual form can express tropes (metaphors) and schemes (rhymes) and how content and form interact in visual rhetoric.
Examples are drawn from print advertisements to illustrate how object grouping can highlight relationships between conceptual domains and contribute to rhetorical purposes.
The chapter invites exercises similar to those in Bateman’s chapter: analyze visual texts across different genres (textbooks, tourist guides, instruction leaflets, popular magazines) and across time periods (e.g., 1950s, 1970s, 1990s) to compare multimodal resources and rhetorical strategies.
Opening theoretical background (introductory):
- Since ancient Greek rhetoric, scholars focused on language as primary mode for rhetorical figuration (figures/schemes and tropes).
- Metaphors and schemes have been traditionally studied in language; this chapter extends the inquiry to visual figures and the interaction of content and form in visual rhetoric.
The chapter bridges visual communication studies with multimodal analysis and invites cross-methodological dialogue with Bateman’s framework.

Page 8

The book’s bibliographic and publication information (front matter) is shown, including the title "Discourse, of Course: An Overview of Research in Discourse Studies" and the editors and publisher details (John Benjamins, 2009).
This page serves as a bibliographic placeholder and is not a content-rich source for the Bateman chapter itself beyond context for citation.

Page 9

Table of contents for the book: sections and chapter titles.
The book is structured into three main parts:
- I. Discourse in communication
- Examples include “Doing discourse analysis with possible worlds” (Andrea Rocci), “Discourses ‘off course’?” (Anna Duszak).
- II. Discourse and other communication modes
- Includes Bateman’s “Discourse across semiotic modes” (Chapter 3).
- III. Discourse types
- Includes chapters on text types, academic/professional genres, and visual communication (e.g., Maes & Schilperoord on visual schemes).
This page summarizes the book’s organization and signals the distribution of topics across chapters and sections for broader study planning.

Page 6 (Supplemental notes)

The GeM annotation layers (Layout, Navigation, Linguistic, Content, Rhetorical, Genre) provide a structured schema for multimodal document analysis.
The three semiotic modes support a modular approach to multimodal texts:
- $\text{text-flow}$ (verbal, linear text)
- $\text{page-flow}$ (spatial layout and relations on a page)
- $\text{image-flow}$ (sequencing of images and their conjunctive relations in time or space)
The framework emphasizes empirical grounding, cross-disciplinary synthesis, and careful treatment of the relationship between form and meaning across modes.
Practical implications include a stronger basis for design critique, teaching, and professional practice in information design, document design, and visual communication.

Summary of key concepts (LaTeX-ready definitions)

Three semiotic modes:
- $\text{text-flow}$ : linear verbal organization on a page.
- $\text{page-flow}$ : two-dimensional spatial organization expressing relations via proximity and grouping.
- $\text{image-flow}$ : sequence/arrangement of images conveying conjunctive relations in time or space.
GeM model layers for page-based artefacts:
- $\text{Layout}$ , $\text{Navigation}$ , $\text{Linguistic}$ , $\text{Content}$ , $\text{Rhetorical}$ , $\text{Genre}$ .
Spatial axes for analysis (Kress & Van Leeuwen-inspired): left-right $(\text{given}, \text{new})$ ; top-bottom $(\text{ideal}, \text{real})$ .
Key empirical method: multimodal corpora with multi-layer annotations; use of large corpora to test hypotheses about layout, mode interactions, and content relations.
Practical implications: design quality, multimodal literacy, and cross-disciplinary collaboration.