PE

Artificial Intelligence and NLP Core Concepts Exam Review

I. Artificial Intelligence (AI) Fundamentals

  • Artificial Intelligence (AI): Create systems for human-like intelligence (learning, reasoning, language).

    • (Related): Augment cognitive abilities.

  • Applied AI: Practical use of AI.

    • (Examples): Voice assists, recommenders, self-driving cars.

  • Generative AI: Creates new content (text, images) from learned patterns.

    • (Mechanism): Learns data distribution for novel samples.

  • Narrow AI (ANI): (Weak AI) Excels at 1 specific task.

    • (Examples): Spam filters, translation, image recognition.

  • Expert Systems: Rule-based AI mimicking human experts.

    • (Components): Knowledge Base, Inference Engine, User Interface.

    • (Limitations): Hard to update, lacks common sense.

  • Data Ethics: Responsible data handling (fairness, privacy).

  • Text as Data: Language quantified for analysis (tokens).

  • Distant Reading: Computational analysis of large text collections.

  • Supervised Learning: AI learns from labeled examples.

  • Unsupervised Learning: AI finds patterns in unlabeled data.

  • Reinforcement Learning: Agent learns by environment interaction, maximizing rewards.

II. Reflection and Application (AI)

  • AI & Numerical Language: AI is 'digital information'. Tokenization/frequency convert language to numbers, fulfilling this principle.

  • Data Ethics & Accountability: Humans interpret topic model themes. Risks: misinterpretation, overstating stats, bias, lack of transparency.

  • Descriptive \to Predictive Analysis: Can create an 'illusion of understanding' but also reveals new, abstract insights into vast texts.

III. Natural Language Processing (NLP) Core Concepts

  • Tokenization: Breaking text into units (words, subwords).

    • (Example): "Hello, world!" \to [\text{Hello}, \text{,}, \text{world}, \text{!}]

  • Frequency Analysis: Counting word/token occurrences.

    • (Use): Identify common terms.

  • KWIC (Key Word in Context): Word shown with immediate context.

    • (Use): Understand word usage/senses.

  • N-Grams: Sequences of N tokens appearing together.

    • (Example): 2-gram of "artificial intelligence" is [\text{artificial intelligence}]

  • Collocations: Words co-occurring more than by chance (e.g., "strong tea").

    • (Use): Reveals conventional pairings.

  • Keyness: Statistical measure for distinguishing terms between texts/corpora.

    • (Use): Identify characteristic terms.

  • Topic Modeling: Unsupervised method finding abstract themes (word clusters) in documents.

    • (Use): High-level understanding of main subjects.

IV. Reflection & Application (NLP)

  • Single Text Analysis: Tokenization, Frequency Analysis, KWIC, N-Grams.

  • Large Corpus Analysis: All methods, especially Keyness, Topic Modeling.

  • Most Human Interpretation: KWIC (nuances), Topic Modeling (labeling/meaning of clusters).

V. Applied Example: Climate Change and AI Corpora

  • Frequency:

    • Climate Change: "carbon," "emission," "global," "warming."

    • AI: "machine," "learning," "neural," "network."

  • Keyness:

    • Climate Change: "Paris Agreement," "renewable energy."

    • AI: "deep learning," "natural language processing."

  • Topic Modeling:

    • Climate Change: "international policy," "ecosystem impacts," "energy solutions."

    • AI: "ML advancements," "ethical considerations," "healthcare applications."