Vocabulary Flashcards – Sentiment Analysis of U.S. Import Tariffs Thesis

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/27

flashcard set

Earn XP

Description and Tags

A set of vocabulary flashcards covering essential terms and techniques mentioned in the lecture notes on comparative sentiment-analysis research using Naïve Bayes and Logistic Regression.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

28 Terms

1
New cards

Sentiment Analysis

The process of automatically identifying and categorising opinions in text as positive, negative or neutral.

2
New cards

Logistic Regression

A supervised learning algorithm that models the probability of a categorical outcome using the logistic (sigmoid) function.

3
New cards

Naïve Bayes

A probabilistic classifier that assumes feature independence and applies Bayes’ theorem to predict class membership.

4
New cards

VADER (Valence Aware Dictionary and sEntiment Reasoner)

A lexicon- and rule-based tool tailored for social-media text that assigns polarity scores and compound sentiment values.

5
New cards

SMOTE (Synthetic Minority Over-sampling Technique)

A resampling method that creates synthetic examples for minority classes to balance imbalanced datasets.

6
New cards

TF-IDF (Term Frequency–Inverse Document Frequency)

A weighting scheme that reflects how important a word is to a document relative to a corpus.

7
New cards

Tokenization

The preprocessing step that splits raw text into smaller units such as words or sub-words called tokens.

8
New cards

Stemming

Reducing words to their root or base form to unify word variants (e.g., ‘running’ → ‘run’).

9
New cards

Stopword Removal

Eliminating very common words (e.g., ‘and’, ‘the’) that carry little semantic value in text analysis.

10
New cards

CRISP-DM

A six-phase, industry-standard methodology for data-mining projects: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment.

11
New cards

Machine Learning

A branch of artificial intelligence that enables systems to learn patterns from data and improve performance over time.

12
New cards

Supervised Learning

A machine-learning paradigm where models are trained using labelled input–output pairs.

13
New cards

GridSearchCV

An exhaustive search technique in scikit-learn that tests multiple hyperparameter combinations using cross-validation.

14
New cards

Precision

The proportion of true positive predictions among all positive predictions made by a model.

15
New cards

Recall

The proportion of true positive predictions captured out of all actual positive instances.

16
New cards

F1-Score

The harmonic mean of precision and recall; balances both metrics into a single measure.

17
New cards

Confusion Matrix

A table showing correct and incorrect predictions broken down by each class, used to evaluate classification models.

18
New cards

WordCloud

A visual representation where word size indicates frequency, revealing prominent terms in text data.

19
New cards

Decision Boundary

The surface that separates different class regions in the feature space according to a classifier.

20
New cards

Uji McNemar (McNemar Test)

A non-parametric statistical test for paired nominal data used to compare two classifiers on the same samples.

21
New cards

API X (formerly Twitter API)

An interface that allows programmatic access to X/Twitter data for retrieving, posting or analysing tweets.

22
New cards

Google Colab

A cloud-based Jupyter Notebook environment providing free CPU, GPU and TPU resources for Python code execution.

23
New cards

TruncatedSVD

A dimensionality-reduction technique that projects high-dimensional sparse data (e.g., TF-IDF) into lower dimensions.

24
New cards

One-vs-Rest Strategy

A multi-class classification approach that trains one binary classifier per class against all other classes.

25
New cards

Vectorizer

A tool (e.g., CountVectorizer, TfidfVectorizer) that converts text into numerical feature vectors for machine-learning models.

26
New cards

Lexicon-Based Analysis

Sentiment detection that relies on predefined dictionaries of words annotated with polarity scores.

27
New cards

Class Imbalance

A situation where some classes have far fewer samples than others, often degrading model performance.

28
New cards

Hyperparameter Tuning

The process of optimising external configuration settings (e.g., C value in Logistic Regression) to improve model performance.