1/12
Flashcards covering key vocabulary and concepts related to Information Retrieval, including retrieval models, relevance, ranked retrieval, Jaccard coefficient, bag of words model, term frequency, document frequency, and tf-idf weighting.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Retrieval Models
Mathematical framework for defining the search process, including explanation of assumptions, the basis of ranking algorithms, and can be implicit theories about relevance.
Relevance
A complex concept that has been studied for some time, with many factors to consider, and people often disagree when making relevance judgments.
Ranked retrieval models
Retrieval system where the system returns an ordering over the (top) documents in the collection for a query.
Free text queries
User's query is just one or more words in a human language
Scoring
Assigning a score to each document to measure how well document and query match.
Jaccard coefficient
A commonly used measure of overlap of two sets A and B, calculated as |A ∩ B| / |A ∪ B|.
Bag of words model
Vector representation that doesn’t consider the ordering of words in a document.
Term frequency (tf)
The number of times that term t occurs in document d.
Log-frequency weighting
The frequency weight of term t in d is 0 if tft,d <= 0, otherwise (1 + log10 tft,d)
Document frequency (df)
The number of documents that contain t.
Inverse document frequency (idf)
An inverse measure of the informativeness of t
Collection frequency
The number of occurrences of t in the collection, counting multiple occurrences.
tf-idf weighting
The product of its tf weight and its idf weight.