Spam Filtering/ Text classification

0.0(0)
studied byStudied by 5 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/22

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

23 Terms

1
New cards

language identification

the process of determining the language of a given text

2
New cards

supervised learning

a type of machine learning where a model is trained using labeled data

3
New cards

training/ testing data

data used to train a machine learning model and evaluate its performance

4
New cards

document classification

the task of assigning a document to one or more predefined categories

5
New cards

binary classification

a classification task with two possible outcomes

6
New cards

multi-class classfication

a classification task with more than two possible categories

7
New cards

Bayes Rule

a mathematival formla used to update probabilities based on new evidence

8
New cards

Naive Bayes

a probabilistic classifier based on Bayes’ Theorem with an assumption of independence among features

9
New cards

logistic regression

a statistical model used for binary classification problems

10
New cards

false positives

incorrectly identifying a non-relebant instance as relevant

11
New cards

false negatives

failing to identify a relevant instance

12
New cards

character n-gram

a sequence of N consecutive characters used in text analysis

13
New cards

spam

unwanted or unsolicited messages, typically emails

14
New cards

spam-filter

a system used to detect and black spam messages

15
New cards

blacklist

a list of entities that are blocked from accessingg a system or service

16
New cards

whitelist

a list of approved entities that are allowed access to a system or service

17
New cards

rule-based filtering

a spam detection approach usingh manually crafted rules

18
New cards

spam probability

the likelihood that a given message is spam

19
New cards

statistical filtering

a spam detection method based on statstical analysis of message content

20
New cards

hand crafted features

features manually designed by experts for machine learning models

21
New cards

kitchen sink features

an approach that includes many featires without filtering for relevance

22
New cards

sparse features

features that have many zero or missing values

23
New cards

dense features

features that have mostly nonzero values and provide rich information