Vocabulary Flashcards – Sentiment Analysis of U.S. Import Tariffs Thesis

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/27

Earn XP

Description and Tags

A set of vocabulary flashcards covering essential terms and techniques mentioned in the lecture notes on comparative sentiment-analysis research using Naïve Bayes and Logistic Regression.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

28 Terms

New cards

Sentiment Analysis

The process of automatically identifying and categorising opinions in text as positive, negative or neutral.

New cards

Logistic Regression

A supervised learning algorithm that models the probability of a categorical outcome using the logistic (sigmoid) function.

New cards

Naïve Bayes

A probabilistic classifier that assumes feature independence and applies Bayes’ theorem to predict class membership.

New cards

VADER (Valence Aware Dictionary and sEntiment Reasoner)

A lexicon- and rule-based tool tailored for social-media text that assigns polarity scores and compound sentiment values.

New cards

SMOTE (Synthetic Minority Over-sampling Technique)

A resampling method that creates synthetic examples for minority classes to balance imbalanced datasets.

New cards

TF-IDF (Term Frequency–Inverse Document Frequency)

A weighting scheme that reflects how important a word is to a document relative to a corpus.

New cards

Tokenization

The preprocessing step that splits raw text into smaller units such as words or sub-words called tokens.

New cards

Stemming

Reducing words to their root or base form to unify word variants (e.g., ‘running’ → ‘run’).

New cards

Stopword Removal

Eliminating very common words (e.g., ‘and’, ‘the’) that carry little semantic value in text analysis.

New cards

CRISP-DM

A six-phase, industry-standard methodology for data-mining projects: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment.

New cards

Machine Learning

A branch of artificial intelligence that enables systems to learn patterns from data and improve performance over time.

New cards

Supervised Learning

A machine-learning paradigm where models are trained using labelled input–output pairs.

New cards

GridSearchCV

An exhaustive search technique in scikit-learn that tests multiple hyperparameter combinations using cross-validation.

New cards

Precision

The proportion of true positive predictions among all positive predictions made by a model.

New cards

Recall

The proportion of true positive predictions captured out of all actual positive instances.

New cards

F1-Score

The harmonic mean of precision and recall; balances both metrics into a single measure.

New cards

Confusion Matrix

A table showing correct and incorrect predictions broken down by each class, used to evaluate classification models.

New cards

WordCloud

A visual representation where word size indicates frequency, revealing prominent terms in text data.

New cards

Decision Boundary

The surface that separates different class regions in the feature space according to a classifier.

New cards

Uji McNemar (McNemar Test)

A non-parametric statistical test for paired nominal data used to compare two classifiers on the same samples.

New cards

API X (formerly Twitter API)

An interface that allows programmatic access to X/Twitter data for retrieving, posting or analysing tweets.

New cards

Google Colab

A cloud-based Jupyter Notebook environment providing free CPU, GPU and TPU resources for Python code execution.

New cards

TruncatedSVD

A dimensionality-reduction technique that projects high-dimensional sparse data (e.g., TF-IDF) into lower dimensions.

New cards

One-vs-Rest Strategy

A multi-class classification approach that trains one binary classifier per class against all other classes.

New cards

Vectorizer

A tool (e.g., CountVectorizer, TfidfVectorizer) that converts text into numerical feature vectors for machine-learning models.

New cards

Lexicon-Based Analysis

Sentiment detection that relies on predefined dictionaries of words annotated with polarity scores.

New cards

Class Imbalance

A situation where some classes have far fewer samples than others, often degrading model performance.

New cards

Hyperparameter Tuning

The process of optimising external configuration settings (e.g., C value in Logistic Regression) to improve model performance.

Explore top notes

Chapter 11: Introduction to Organic Chemistry: Hydrocarbons

Updated 935d ago

Note

Mesopotamia

Updated 1094d ago

Note

Rhetorical Device Vocabulary 1

Updated 1150d ago

Note

8. Indicators and Assessment Systems

Updated 939d ago

Note

Generalized Anxiety Disorder

Updated 933d ago

Note

Purposive Communication - "Communication"

Updated 1054d ago

Note

General Strategies

Updated 974d ago

Note

1.1: What is Science?

Updated 992d ago

Note

Explore top flashcards

WHAP SPICE-T Vocabulary

Updated 709d ago

Flashcards (94)

English 2H Finals Literary Devices and Vocabulary

Updated 979d ago

Flashcards (53)

engels U3 lesson 1 NL-EN

Updated 901d ago

Flashcards (39)

Rhetoric Analysis Vocab

Flashcards (40)

Flashcards (63)

Flashcards (64)

Credit Test 2 Past Paper Questions

Updated 988d ago

Flashcards (118)

Destination B2 Unit 6

Updated 58d ago

Flashcards (138)