Artificial Intelligence and NLP Core Concepts Exam Review
I. Artificial Intelligence (AI) Fundamentals
Artificial Intelligence (AI): Create systems for human-like intelligence (learning, reasoning, language).
(Related): Augment cognitive abilities.
Applied AI: Practical use of AI.
(Examples): Voice assists, recommenders, self-driving cars.
Generative AI: Creates new content (text, images) from learned patterns.
(Mechanism): Learns data distribution for novel samples.
Narrow AI (ANI): (Weak AI) Excels at 1 specific task.
(Examples): Spam filters, translation, image recognition.
Expert Systems: Rule-based AI mimicking human experts.
(Components): Knowledge Base, Inference Engine, User Interface.
(Limitations): Hard to update, lacks common sense.
Data Ethics: Responsible data handling (fairness, privacy).
Text as Data: Language quantified for analysis (tokens).
Distant Reading: Computational analysis of large text collections.
Supervised Learning: AI learns from labeled examples.
Unsupervised Learning: AI finds patterns in unlabeled data.
Reinforcement Learning: Agent learns by environment interaction, maximizing rewards.
II. Reflection and Application (AI)
AI & Numerical Language: AI is 'digital information'. Tokenization/frequency convert language to numbers, fulfilling this principle.
Data Ethics & Accountability: Humans interpret topic model themes. Risks: misinterpretation, overstating stats, bias, lack of transparency.
Descriptive \to Predictive Analysis: Can create an 'illusion of understanding' but also reveals new, abstract insights into vast texts.
III. Natural Language Processing (NLP) Core Concepts
Tokenization: Breaking text into units (words, subwords).
(Example): "Hello, world!" \to [\text{Hello}, \text{,}, \text{world}, \text{!}]
Frequency Analysis: Counting word/token occurrences.
(Use): Identify common terms.
KWIC (Key Word in Context): Word shown with immediate context.
(Use): Understand word usage/senses.
N-Grams: Sequences of N tokens appearing together.
(Example): 2-gram of "artificial intelligence" is [\text{artificial intelligence}]
Collocations: Words co-occurring more than by chance (e.g., "strong tea").
(Use): Reveals conventional pairings.
Keyness: Statistical measure for distinguishing terms between texts/corpora.
(Use): Identify characteristic terms.
Topic Modeling: Unsupervised method finding abstract themes (word clusters) in documents.
(Use): High-level understanding of main subjects.
IV. Reflection & Application (NLP)
Single Text Analysis: Tokenization, Frequency Analysis, KWIC, N-Grams.
Large Corpus Analysis: All methods, especially Keyness, Topic Modeling.
Most Human Interpretation: KWIC (nuances), Topic Modeling (labeling/meaning of clusters).
V. Applied Example: Climate Change and AI Corpora
Frequency:
Climate Change: "carbon," "emission," "global," "warming."
AI: "machine," "learning," "neural," "network."
Keyness:
Climate Change: "Paris Agreement," "renewable energy."
AI: "deep learning," "natural language processing."
Topic Modeling:
Climate Change: "international policy," "ecosystem impacts," "energy solutions."
AI: "ML advancements," "ethical considerations," "healthcare applications."