Don't use GPU Colab when not actively using GPUs to conserve resources.
Start with a tiny dataset when building code, potentially using the CPU initially.
HuggingFace transformers can be complex, so refer to HuggingFace tutorials for examples, as API documentation lacks them.
Restart the notebook after installing packages and place installation code at the top.
Disable GPUs for better error messages by using the following code at the start and restarting:
import os
os.environ["CUDAVISIBLEDEVICES"] = ""
Alternative GPU options if Colab time runs out:
Personal machine's GPU.
Kaggle.
Avoid paying for Colab Pro if possible.
Part-of-Speech Tagging & Parsing
Introduction to Natural Language Processing (NLP).
Sequential labeling: Parts-of-Speech (POS), Named Entity Recognition.
Classification techniques: Hidden Markov Model (HMM), Viterbi Algorithm.
Tree Structures: Constituency and Dependency Syntax Trees.
Parsing techniques: Transition-Based Parsing.
Understanding Text
Previously covered:
Bag-of-words representations (e.g., {john, drank, some, wine, on, the, new, sofa, it, was, red}) with weighting schemes like TF-IDF.
N-gram language models:
P("John drank some wine on the new sofa. It was red.")
P("red" | "John drank some wine on the new sofa. It was") ≈ P("red" | "It was")
Markov Assumption
Limitations of previous approaches: They lack true understanding of the text.