CS121 Quiz 4

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/20

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:39 AM on 5/24/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

21 Terms

1
New cards

MapReduce

distributed programming tool used for indexing and analysis

2
New cards

mapper

transforms a list of items into another list of items of the same length

3
New cards

reducer

transforms a list of items into a single item

4
New cards

distributed processing

uses large number of inexpensive servers driven by the need to index and analyze big data

5
New cards

director server

distribtues the query to multiple indexing machines

6
New cards

index server

only processes part of the query

7
New cards

director machine

organizes the results and returns them to the user

8
New cards

Jaccard coefficient

Jaccard(A,B) = |A n B|/|AUB|

9
New cards

bag of words model

each document is stored as a vector of word occurrence counts, ignoring the order of the words.

10
New cards

term frequency

number of times term t occurs in document d

11
New cards

score for the doc-query pair

the sum over terms t in both q and d

12
New cards

inverse document frequency (IDF)

log(N/df)

no effect on one term queries

13
New cards

document frequency(df)

number of documents that contain term t

14
New cards

collection frequency

number of occurrences of t in the collection including duplicates

15
New cards

TF-IDF

tf-idf = (1+log(tf)) * log(N/df)

16
New cards

document as vector: the terms

axes of the space

17
New cards

documents as vectors: the documents

points in the space

18
New cards

cosine(query, document)

cos(q,d) = (q*d)/|q|*|v|

19
New cards

normalize vector by length

||x|| = sum x_i²

20
New cards

vector space ranking steps

  1. represent query and document as weighted tf-idf vectors

  2. compute cosine similarity scores for both

  3. rank documents with respect to query by score

  4. return top k

21
New cards

cosine for length-normalized vectors

cos(q,d) = sum i = 1 to |v| [(qi*di)]