Artificial Intelligence and NLP Core Concepts Exam Review

I. Artificial Intelligence (AI) Fundamentals

Artificial Intelligence (AI): Create systems for human-like intelligence (learning, reasoning, language).
- (Related): Augment cognitive abilities.
Applied AI: Practical use of AI.
- (Examples): Voice assists, recommenders, self-driving cars.
Generative AI: Creates new content (text, images) from learned patterns.
- (Mechanism): Learns data distribution for novel samples.
Narrow AI (ANI): (Weak AI) Excels at 1 specific task.
- (Examples): Spam filters, translation, image recognition.
Expert Systems: Rule-based AI mimicking human experts.
- (Components): Knowledge Base, Inference Engine, User Interface.
- (Limitations): Hard to update, lacks common sense.
Data Ethics: Responsible data handling (fairness, privacy).
Text as Data: Language quantified for analysis (tokens).
Distant Reading: Computational analysis of large text collections.
Supervised Learning: AI learns from labeled examples.
Unsupervised Learning: AI finds patterns in unlabeled data.
Reinforcement Learning: Agent learns by environment interaction, maximizing rewards.

II. Reflection and Application (AI)

AI & Numerical Language: AI is 'digital information'. Tokenization/frequency convert language to numbers, fulfilling this principle.
Data Ethics & Accountability: Humans interpret topic model themes. Risks: misinterpretation, overstating stats, bias, lack of transparency.
Descriptive \to Predictive Analysis: Can create an 'illusion of understanding' but also reveals new, abstract insights into vast texts.

III. Natural Language Processing (NLP) Core Concepts

Tokenization: Breaking text into units (words, subwords).
- (Example): "Hello, world!" \to [\text{Hello}, \text{,}, \text{world}, \text{!}]
Frequency Analysis: Counting word/token occurrences.
- (Use): Identify common terms.
KWIC (Key Word in Context): Word shown with immediate context.
- (Use): Understand word usage/senses.
N-Grams: Sequences of N tokens appearing together.
- (Example): 2-gram of "artificial intelligence" is [\text{artificial intelligence}]
Collocations: Words co-occurring more than by chance (e.g., "strong tea").
- (Use): Reveals conventional pairings.
Keyness: Statistical measure for distinguishing terms between texts/corpora.
- (Use): Identify characteristic terms.
Topic Modeling: Unsupervised method finding abstract themes (word clusters) in documents.
- (Use): High-level understanding of main subjects.

IV. Reflection & Application (NLP)

Single Text Analysis: Tokenization, Frequency Analysis, KWIC, N-Grams.
Large Corpus Analysis: All methods, especially Keyness, Topic Modeling.
Most Human Interpretation: KWIC (nuances), Topic Modeling (labeling/meaning of clusters).

V. Applied Example: Climate Change and AI Corpora

Frequency:
- Climate Change: "carbon," "emission," "global," "warming."
- AI: "machine," "learning," "neural," "network."
Keyness:
- Climate Change: "Paris Agreement," "renewable energy."
- AI: "deep learning," "natural language processing."
Topic Modeling:
- Climate Change: "international policy," "ecosystem impacts," "energy solutions."
- AI: "ML advancements," "ethical considerations," "healthcare applications."