LING 1010 Language and Mind: Artificial Intelligence and Human Language

Chatbot: A computer program simulating human conversation via text responses to user input.
- Historical examples: ELIZA ($1960$s).
- Recent examples: ChatGPT, Bard.
Recent chatbots produce novel, grammatical, and relevant sentences.
Unlike traditional grammars (symbolic rules), these chatbots are based on Large Language Models (LLMs).

Cognitive Science: Interdisciplinary field scientifically investigating information processing in the human brain (perception, reasoning, language, memory). Linguistics is the branch focusing on language.
Classical Computational Theory of Mind:
- Human mind is like a digital computer, performing rule-governed computations on symbolic representations.
- Influenced by logic and early computer science.
Connectionism:
- Human mind is a product of the human brain.
- Models inspired by neural wiring; information is distributed in neural networks without explicit symbols or rules.
- Influenced by neuroscience.
Artificial Intelligence (AI): An engineering project to build intelligent computers/machines.
- Intelligence typically involves reasoning, problem-solving, language, decision-making.
- Natural Language Processing (NLP) is the AI branch dealing with language.
- AI aims for efficient solutions, not necessarily mimicking human processes.

Language Model: Any NLP program predicting the next word from a sequence of words.
Modern chatbots use LLMs with significant computational power.
LLMs (e.g., GPT-$4$, LLaMa, PaLM-$2$) run on Artificial Neural Networks (ANNs).
ANNs are not symbolically programmed but trained on vast datasets, adjusting numerical weights (parameters) for better outputs.
Scale of LLMs (e.g., GPT-$4$):
- 1.76 trillion parameters.
- Trained on 13 trillion word tokens.
- ANN has 120 layers of hidden nodes.
- Training required 10,000+ advanced semiconductor chips.

Experts cannot interpret the inner workings of LLMs and lack techniques to understand their knowledge or reasoning. LLMs are often considered "black boxes."
Benchmarking (e.g., BLiMP): Used to test LLM linguistic abilities, often based on surprisal, showing high similarity to human language on tests.
It is unclear if LLMs follow symbolic grammatical rules or ascribe meaning to generated text. Some argue no actual language understanding occurs.

LLMs do not learn like humans; they are trained on datasets too large for any human lifetime (e.g., children acquire language by age 6 with tens of millions of word tokens).
LLMs generally fail to reach human-level accuracy when restricted to human-scale data.
Human advantages in language acquisition include sensorimotor stimuli, inter-agent interaction, environmental interaction, and prosody, which LLMs typically lack.
LLMs serve as powerful tools for studying human language acquisition, with research involving realistic datasets like the BabyLM Challenge.

Debate: Are LLMs creative or "stochastic parrots" (haphazardly stitching linguistic forms without meaning)?
Risk of LLMs parroting biases and prejudices present in their vast training data.
"No reliable techniques for steering the behavior of LLMs" raises concerns about controlling output.