Wednesday_Week11_Ling200

Page 1: Introduction

Presented by: Amandalynne Paullada
Department of Linguistics, UW
Date: 4 December 2024
Topic: Computational Linguistics
Guest Lecture with material from E. Bender, N. Tachikawa Shapiro

Page 2: Definition of Computational Linguistics

Definition (ACL): Computational linguistics is the scientific study of language from a computational perspective.
- Concerned with creating computational models of linguistic phenomena.

Page 3: Computational Approaches to Linguistics

Utilizing computers for linguistic data analysis and hypothesis testing.
Natural Language Processing (NLP):
- Algorithms and models enabling computers to manipulate human language.

Page 4: Corporal Analysis in Linguistics

Corpus: A large compilation of written/spoken language kinds.
- Often requires both computational and manual curation.
- Example: Corpus del Español - consists of:
  - 100 million words of Spanish from the 1200s to 1900s.
  - 2 million words from the internet.
- Additional resources available from the Linguistic Data Consortium.

Page 5: Computational Syntax and Analysis

Computing linguistic data involves:
- Analyzing syntactic structures' frequency.
Complexity of Syntax: Algorithms are applied to produce parse trees from the text.

Page 6: Sociolinguistic Variation

Computational Sociolinguistics:
- Analyze linguistic variation across time.
- Study sociolinguistic variation in digital contexts.

Page 7: Distributional Semantics

Foundation of word vectors (embeddings):
- Hypothesis: Words used in similar contexts tend to have similar meanings.
- Rather than symbolic meanings, it models word contexts mathematically.
- Similar words are represented as proximate points in high-dimensional vector space.
- Quotation: “You shall know a word by the company it keeps” – Firth (1957).

Page 8: Example of Distributional Semantics

Reference to Figure 1 in “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change” (Hamilton et al 2016).

Page 9: Effects of Digital Technology on Language

Investigation of sociolinguistic variations in digital communication across different demographics:
- Differences in text messaging styles between generations (Boomers vs. Gen Z).
- Gender-related emoji usage effects.
- Specific language use variances between different personal contexts (romantic vs. platonic texts).
- New vocabulary acquisition sources, including TikTok, friends, and media.

Page 10: Natural Language Processing (NLP)

Roles of NLP:
- Enable extraction of patterns and information from language data (Natural Language Understanding).
- Generate coherent human language strings (Natural Language Generation).
Applications:
- Search engines, Machine translation, Content moderation, Chatbots, Spelling/grammar checkers, Predictive text, etc.

Page 11: Speech Processing and Synthesis

Automatic Speech Recognition: Techniques to process spoken language streams into words/phrases.
Speech Synthesis: Develop computational models that approximate human speech sounds based on contextual information.

Page 12: Application of Speech Recognition in Siri

Automatic Speech Recognition: Capture & segment audio.
Natural Language Understanding: Extract meaningful components from transcriptions.
Natural Language Generation: Formulate responses.
Speech Synthesis: Produce audio with correct pronunciation and intonation.

Page 13: Machine Learning in NLP

Many NLP applications employ machine learning, particularly supervised learning:
- Example: Classifying spam vs. normal emails.
- Utilizes labeled examples for training models.

Page 14: Content Moderation Applications

Content moderation on online platforms involves enforcing posting policies, sometimes through algorithms.
- Automated processes may utilize NLP models allowing contextual analysis instead of blanket restrictions.
- Supervised learning can be used for better contextual understanding.

Page 15: Machine Translation Applications

Algorithms facilitate translation across various human languages in multiple formats (speech/text).
- Example: Google Translate and Google Lens.
- Digital media occasionally misinterprets content, illustrating imperfections in AI translation processes.

Page 16: Biomedical NLP Applications

Extension of NLP models to biomedical discourse, aiding in:
- Searching peer-reviewed research for specific topics (e.g. COVID-19).
- Analyzing social determinants of health in unstructured clinical data.

Page 17: Language Models

Core Concept: Collect word sequence statistics from text corpora to estimate probabilities.
- Example demonstrates how particular words are followed by others based on frequency.

Page 18: Language Models Continued

Practical applications for predictive modeling in speech recognition.
- Establish a likelihood for specific word sequences in ambiguous contexts (e.g. homophones).

Page 19: Contextual Predictions in Language Models

Importance of context in next-word predictions highlights limitations of simple models.
Real-time user engagement with predictive text features on smartphones illustrates practical relevance.

Page 20: Large Language Models Overview

LLMs (e.g. ChatGPT) designed for next-word prediction tasks with vast datasets and advanced computational structures.
- Utilization of Reinforcement Learning from Human Feedback (RLHF) to steer preferred output behaviors.
Investigating human-like language acquisition by LLMs remains an active area of study.

Page 21: Scale of Large Language Models

Magnitude of Models:
- Data Size: Billions of tokens gathered from varied online sources.
- Model Size: Comprised of billions of parameters with considerable computational demands and environmental impact.

Page 22: Challenges of Linguistic Diversity in NLP

Digital representation challenges faced by underrepresented languages in online spaces.
- Issues linked to script availability, social-political factors affecting language vitality, and minimal usage online.

Page 23: Suggested Further Readings

Recommended Sources:
- Language Files, Chapter 16.
- Speech & Language Processing by Jurafsky & Martin (free online).
- Natural Language Processing with Python (Bird, Klein & Loper, free to read online).

Page 24: Professional Organizations and Conferences

Notable Associations:
- Association for Computational Linguistics (ACL)
- Society for Computation in Linguistics (SCiL)
- Conference on Computational Natural Language Learning (CoNLL)
- Empirical Methods in Natural Language Processing (EMNLP)

Page 25: University of Washington Opportunities

Programs and Research:
- UW NLP seminars.
- Master’s program in Computational Linguistics.
- Courses: LING 471, LING 472, CSE 447.

Page 26: Q&A Session

Encouragement to ask questions for further clarity and understanding.