1/17
Vocabulary flashcards related to optimizing query evaluation techniques.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Distributed Query Evaluation
A process that speeds up query processing by sending queries to a director machine, which then distributes them to multiple index servers for processing.
Document Distribution
A distributed query evaluation approach where each index server acts as a search engine for a small fraction of the total collection of documents.
Term Distribution
A distributed query evaluation approach where a single index is built for the entire cluster of machines, and each inverted list within that index is assigned to one index server.
Query Caching
A technique to improve effectiveness by caching popular query results and common inverted lists, which can help even with unique queries.
Hapax Legomena
Words that occur only once in a corpus.
Document-at-a-time
A query processing approach that calculates complete scores for documents by processing all term lists, one document at a time.
Term-at-a-time
A query processing approach that accumulates scores for documents by processing term lists one at a time.
getCurrentDocument()
A pseudocode function that returns the document number of the current posting of the inverted list.
skipForwardToDocument(d)
A pseudocode function that moves forward in the inverted list until getCurrentDocument() <= d.
movePastDocument(d)
A pseudocode function that moves forward in the inverted list until getCurrentDocument() < d.
moveToNextDocument()
A pseudocode function Equivalent to movePastDocument(getCurrentDocument()).
getNextAccumulator(d)
A pseudocode function that returns the first document number d' >= d that has already has an accumulator.
removeAccumulatorsBetween(a, b)
A pseudocode function that removes all accumulators for documents numbers between a and b.
Conjunctive Processing
A type of query optimization where every returned document must contain all query terms and works best when one of the query terms is rare.
Threshold Methods
Query processing optimization techniques that use the number of top-ranked documents needed (k) to estimate a threshold score (τʹ) to ignore documents.
MaxScore Method
Compares the maximum score a remaining document could have to the estimated threshold and ignores parts of inverted lists that will not generate document scores above the threshold.
Early Termination
An approach to query processing that improve performance but may sacrafice result quality.
List ordering
An approach that orders inverted lists by a quality metric (e.g., PageRank) or by partial score to produce good documents.