Web Mining

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/111

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

112 Terms

1
New cards

What is the primary purpose of a data warehouse in data mining?

To organize and store historical data for analysis

2
New cards

Which technique is commonly used for classification tasks?

Decision trees

3
New cards

Which of the following best describes directed data mining?

Focusing on a specific target field or outcome

4
New cards

_______ is the process of segmenting a heterogeneous population into more homogeneous subgroups.

clustering

5
New cards

A ________ in data mining represents a set of rules or algorithms that connect inputs to a specific target outcome.

model

6
New cards

Data mining is the exploration and analysis of large quantities of data to discover meaningful ________ and rules.

patterns

7
New cards

Affinity grouping is useful for predicting future behavior.

false

8
New cards

One goal of data mining is to increase customer retention by predicting customer churn.

true

9
New cards

Which of the following is NOT a task of data mining?

Data warehousing

10
New cards

Clustering requires predefined classes to segment data.

false

11
New cards

Web mining can only be performed on textual data available on web pages.

false

12
New cards

Web structure mining focuses on discovering patterns and knowledge from the hyperlinks between web pages.

true

13
New cards

Web usage mining primarily deals with extracting information from hyperlinks within a web page.

false

14
New cards

Which of the following is not a type of Web mining?

Web Image Mining

15
New cards

Web content mining primarily involves:

Extracting useful information from web page contents

16
New cards

What is a key challenge in Web mining?

Quality control over Web data

17
New cards

Hyperlinks across different websites convey:

18
New cards

Web ________ mining extracts useful information from the main content of web pages.

content

19
New cards

Web usage mining involves analyzing ________ to discover user activity patterns.

20
New cards

Hyperlinks pointing to a page indicate its ________ because many people trust or reference it.

21
New cards

Data cleaning is necessary to ensure data accuracy, handle missing or incomplete information, and maintain consistency.

true

22
New cards

Noise in data refers to extreme values that deviate significantly from other data points.

false

23
New cards

Which of the following is NOT a step in the KDD process?

Data Storage

24
New cards

_______________ converts continuous data into discrete intervals or categories, making patterns easier to interpret.

Discretization

25
New cards

Which technique is used to detect redundant attributes during data integration?

Correlation Analysis

26
New cards

The Data Mining process encompasses all steps of the Knowledge Discovery in Database (KDD) process.

False

27
New cards

During data cleaning, missing values can be handled by:

All of the above.

28
New cards

The KDD process aims to transform raw data into _______________.

knowledge

29
New cards

What is the purpose of normalization during data transformation?

To scale attributes to a common range.

30
New cards

Principal Component Analysis (PCA) is a technique used for _______________ reduction.

Dimensionality

31
New cards

 consists of a collection of interrelated data, known as a database

Database data

32
New cards

a repository of information collected from multiple sources

Data warehouse data

33
New cards

An entity is represented by a

data object

34
New cards

Typically describe data objects

attributes

35
New cards

Data objects stored in a database are called

data tuples

36
New cards

Which of the following terms are often used interchangeably (Select all that apply)

variable, attribute, feature, dimension

37
New cards

Ordinal attributes are symbols or names of things

false

38
New cards

If an attribute is not discrete, it is said to be.

continuous

39
New cards

Measures of central tendency refer to [ans], ____, _____

median, mean

40
New cards

The mean obtained after chopping off values at the high and low extremes

trimmed mean

41
New cards

What is the main difference between PrefixSpan and GSP?

PrefixSpan is faster because it skips the candidate generation step.

42
New cards

Sequential pattern mining considers the order of items in the dataset.

true

43
New cards

PrefixSpan generates candidates explicitly before mining the dataset.

false

44
New cards

What is the primary purpose of sequential pattern mining?

To discover meaningful patterns in ordered sequences

45
New cards

A rule with high confidence indicates that the ____________ is very likely to occur after the antecedent.

Consequent

46
New cards

The GSP algorithm generates all possible candidates at once, regardless of their support.

False

47
New cards

Which of the following is a limitation of the GSP algorithm?

It requires explicit candidate generation for each iteration.

48
New cards

In sequential rule generation, what metric is used to determine the strength of a rule?

Confidence

49
New cards

The ______________ algorithm eliminates candidate generation by recursively projecting the database.

PrefixSpan

50
New cards

Sequential pattern mining considers not only the items but also their ____________ in the dataset.

Order or Sequence

51
New cards

Classification predicts discrete and nominal values by categorizing data into distinct classes.

  True

52
New cards

A classifier that performs extremely well on the training set will necessarily perform well on unseen test data.

false

53
New cards

N-fold cross validation is a technique commonly used to evaluate classifier performance on small datasets.

True

54
New cards

What is the primary goal of data classification?

To organize and categorize data into distinct classes

55
New cards

Which measure is more appropriate for evaluating classifier performance on highly imbalanced datasets?

Precision and Recall

56
New cards

Which cross-validation method is most commonly used when the available data is small?

n-Fold Cross Validation

57
New cards

The ______ attribute in a dataset serves as the target variable in classification tasks.

class

58
New cards

The F-score is defined as the harmonic mean of ______ and _______.

precision and recall

59
New cards

Which of the following are common classification methods? (Select all that apply)

K-Nearest Neighbor, Support Vector Machines, Bayesian Classification, Decision Tree Induction

60
New cards

Which components are part of a confusion matrix? (Select all that apply)

False Negative (FN), False Positive (FP), True Positive (TP), True Negative (TN)

61
New cards

Bayes’ theorem plays a critical role in [ans] learning and _______.

classification

62
New cards

Bayes' classifier uses the prior probability of each class given information about an item.

false

63
New cards

Prior knowledge can be combined with observed data

True

64
New cards

P(A^B)=P(A|B)P(B) is equal to P(A^B)=P(A|B)P(A)

false

65
New cards

Can be naturally studied from a probabilistic point of view.

Supervised learning

66
New cards

They are normally estimated based on observed frequencies in the training data.

Probabilities

67
New cards

In order to account for estimation from small samples, probability estimates are ___.

adjusted or smoothed

68
New cards

P(A | B) is the probability of [ans], assuming that _____ is all and only information known.

B

69
New cards

The task of _____ can be regarded as estimating the class [ans] probabilities

classification

70
New cards

Data mining and knowledge discovery are the same processes.

false

71
New cards

Web mining is limited to analyzing only web page content, excluding hyperlinks and user interactions.

false

72
New cards

Directed data mining involves discovering hidden patterns without a specific target.

false

73
New cards

Classification and prediction tasks in data mining are entirely distinct and have no similarities.

false

74
New cards

Data preprocessing is an optional step in the KDD process.

false

75
New cards

Decision trees are a type of classification method.

true

76
New cards

Web structure mining uses the hyperlink structure of the web to find patterns.

True

77
New cards

Data reduction aims to decrease the size of a dataset while retaining critical information.

true

78
New cards

discrete attribute has a finite or countably infinite set of values.

true

79
New cards

An F-score is used to measure both precision and recall in classification tasks.

true

80
New cards

The main goal of data mining is to uncover meaningful ______, ______, and ______.

patterns, trends, rules

81
New cards

___ is a measure used in association rules to determine how often items appear together.

support

82
New cards

Data ___ involves removing noise, filling in missing values, and resolving errors.

cleaning

83
New cards

In a decision tree, a ___ node represents a test on an attribute.

internal

84
New cards

The F-score is the harmonic mean of ___ and ___.

recall, precision

85
New cards

In a confusion matrix, ___ refers to correctly classified positive examples.

True Positive, TP

86
New cards

Sequential pattern mining considers the ___ of data points in its analysis.

order

87
New cards

A ___ is a visual representation that uses a box to display the five-number summary of data.

boxplot

88
New cards

The ___ theorem plays a critical role in Bayesian classification.

Bayes’

89
New cards

Web ___ mining focuses on analyzing user behavior through web logs and session data.

usage

90
New cards

What is the first step in the KDD process?

Understanding the Application Domain

91
New cards

Which method is NOT used for handling missing values in data cleaning?

Fill with random numbers

92
New cards

What is the main difference between classification and prediction in data mining?

Prediction deals with future outcomes

93
New cards

Which type of attribute has a natural zero point?

Ratio

94
New cards

Which classification method uses the concept of probability distributions?

Bayesian Classification

95
New cards

In web mining, what does PageRank algorithm primarily assess?

Page importance

96
New cards

Which of the following is NOT a data preprocessing stage?

Data Visualization

97
New cards

What is the purpose of using cross-validation in classification?

To assess the accuracy of the classifier

98
New cards

Which is NOT a common data mining task?

Sorting

99
New cards

Which technique is used in dimensionality reduction?

Principal Component Analysis (PCA)

100
New cards

Which concept is used to handle noisy data?

Binning