Looks like no one added any tags here yet for you.
language identification
the process of determining the language of a given text
supervised learning
a type of machine learning where a model is trained using labeled data
training/ testing data
data used to train a machine learning model and evaluate its performance
document classification
the task of assigning a document to one or more predefined categories
binary classification
a classification task with two possible outcomes
multi-class classfication
a classification task with more than two possible categories
Bayes Rule
a mathematival formla used to update probabilities based on new evidence
Naive Bayes
a probabilistic classifier based on Bayes’ Theorem with an assumption of independence among features
logistic regression
a statistical model used for binary classification problems
false positives
incorrectly identifying a non-relebant instance as relevant
false negatives
failing to identify a relevant instance
character n-gram
a sequence of N consecutive characters used in text analysis
spam
unwanted or unsolicited messages, typically emails
spam-filter
a system used to detect and black spam messages
blacklist
a list of entities that are blocked from accessingg a system or service
whitelist
a list of approved entities that are allowed access to a system or service
rule-based filtering
a spam detection approach usingh manually crafted rules
spam probability
the likelihood that a given message is spam
statistical filtering
a spam detection method based on statstical analysis of message content
hand crafted features
features manually designed by experts for machine learning models
kitchen sink features
an approach that includes many featires without filtering for relevance
sparse features
features that have many zero or missing values
dense features
features that have mostly nonzero values and provide rich information