Computer Science Quiz 1-4: Key Terms & Definitions

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/42

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

43 Terms

1
New cards

What is information retrieval?

It's a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information

2
New cards

Which of the following are examples of work within the Information Retrieval field?

web search engines

filtering for documents of interest

the design of a relational database for a library

classifying books into categories

automatically answer customer support questions

web search engines,

filtering for documents of interest,

classifying books into categories,

automatically answer customer support questions

3
New cards

web search

gathering and finding information on the web

4
New cards

vertical search

gathering and finding information focused on a specific topic

5
New cards

desktop search

gathering and finding information on a single computer

6
New cards

peer-to-peer search

gathering and finding information on network of independent nodes

7
New cards

enterprise search

gathering and finding information within a company's network

8
New cards

Find out what is going on today at UCI?

relational database vs search engine

search engine

9
New cards

Find all female students whose last name is Smith

relational database vs search engine

relational database

10
New cards

Find how to split words in python

relational database vs search engine

search engine

11
New cards

What is the weather like in bali

relational database vs search engine

search engine

12
New cards

What were the temperature and humidity values registered in Crystal Cove between 9/1/2019 and 10/1/2019?

relational database vs search engine

relational database

13
New cards

What contributes to the relevance of a document with respect to a query in the context of a search engine?

prior queries made by the same user, the author of the document, the geographic location of the person who's querying, the popularity of the document, textual similarity, the geographic origin of the document

14
New cards

True or False? The right side of the architecture pertains to processes that are done well before any query is done.

false

15
New cards

How is advertisement integrated with web search?

The user's query goes both to the search engine and the ad engine; the search engine retrieves the most relevant results, the ad engine uses an auction system on the query words.

16
New cards

Cost Per Mil (CPM)

Cost for showing the ad 1000 times

17
New cards

Cost Per Click (CPC)

Cost for users clicking on the ad after it is shown to them

18
New cards

The following is the syntax of a URL:

A://B/C?D#E

A: scheme

B: authority

C: path

D: query

E: fragment

19
New cards

Which of these are Universal Resource Identifiers (URI)?

ISBN 0-486-2777-3

rmi://filter.uci.edu

"Pride and Prejudice"

http://www.ics.uci.edu/~lopes

ISBN 0-486-2777-3

rmi://filter.uci.edu

http://www.ics.uci.edu/~lopes

20
New cards

Besides web crawling, what other ways are there to obtain data from Web sites?

targeted downloads of specific URLs

Downloads from the Library of Congress

Data dumps provided by companies and organizations

Web APIs provided by certain web sites

targeted downloads of specific URLs

Data dumps provided by companies and organizations

Web APIs provided by certain web sites

21
New cards

Consider the following robots.txt file:

User-agent: *

Disallow: /foo

Disallow: /bar

User-agent: Googlebot

Disallow: /baz/a

According to this, the Googlebot is

Allowed to crawl /foo and /bar

Not allowed to crawl neither /foo nor /bar

Not allowed to crawl /baz/a

Allowed to crawl /baz/a

Allowed to crawl /foo and /bar

Not allowed to crawl /baz/a

22
New cards

True or False?

"If something is on the Web, a Web crawler has the right to get it"

False

23
New cards

Hw1 (tokenization) pertains mostly to which of the following

index

text acquisition

text transformation

text transformation

24
New cards

How large is the web, measured in number of hosts?

O(quadrillion)

O(million)

O(billion)

O(trillion)

O(billion)

25
New cards

Consider the sequence of characters "hello!!"

is this a token

maybe, it depends on what your definition of a token is

26
New cards

Should crawlers wait between requests to the same web site?

yes

27
New cards

All the crawler traps that exist on the web are deliberately created

False

28
New cards

What is bad about crawler traps

they are hard to detect, they make web crawlers busy for no good reason, they prevent or delay crawlers from going to other sites

29
New cards

What is the frontier of a web crawler?

Its the set of URLs that have been seen but not yet crawled

30
New cards

What are HTTP status codes 2xx?

Page retrieved successfully

31
New cards

What are HTTP status codes 3xx?

Redirection

32
New cards

A normal crawler fetches pages directly from the web servers. However, your crawler used a cache server to fetch pages. Why?

Because having more than one hundred crawlers fetching pages directly could overload the ICS network if the crawlers are not properly developed

33
New cards

Which of the following methods can you DIRECTLY use to detect pages or documents that are near duplicates?

Simhash

Fingerprint

Cyclic redundancy check

document slope curve

Blake3

Simhash

Fingerprint

34
New cards

The deep web is a large part of the web that only has encrypted content, and thus it is not crawled nor indexed by normal search engines. T/F

False

35
New cards

Web crawlers can and should send hundreds of requests per second to a web site, because otherwise they will take a very long time to crawl. T/F

False

36
New cards

What is the main problem of using a term-document matrix for searching in large collections of documents?

It is an efficient use of memory

37
New cards

Should crawlers hit the same web site as fast as possible as a strategy to crawl faster?

No

38
New cards

What is an inverted index?

its a map with terms as keys and postings lists as values

39
New cards

what is the minimum information is a posting?

the document id

40
New cards

Consider the following sentences (each sentence is to be considered a different document):

S1: I tried searching for this error but got me nowhere.

S2: To be or not to be, this is the question.

S3: This seems to do the trick.

What are the postings for the term "to"?

S2, S3

41
New cards

Reading 1MB sequentially from memory is faster than Reading 1 MB sequentially from disk.

True

42
New cards

Reading 1 MB sequentially from memory is 2 times faster than Reading 1 MB sequentially from disk

False

43
New cards

In Boolean retrieval, a query that ANDs three terms results in having to intersect three lists of postings. Assume the three lists are of size n, m, q, respectively, each being very large

If you keep the lists unsorted, what best approximates the complexity of a 3-way intersection algorithm

O(nmq)