Web Mining

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/111

There's no tags or description

Looks like no tags are added yet.

Last updated 1:31 AM on 4/18/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

112 Terms

New cards

What is the primary purpose of a data warehouse in data mining?

To organize and store historical data for analysis

New cards

Which technique is commonly used for classification tasks?

Decision trees

New cards

Which of the following best describes directed data mining?

Focusing on a specific target field or outcome

New cards

_______ is the process of segmenting a heterogeneous population into more homogeneous subgroups.

clustering

New cards

A ________ in data mining represents a set of rules or algorithms that connect inputs to a specific target outcome.

model

New cards

Data mining is the exploration and analysis of large quantities of data to discover meaningful ________ and rules.

patterns

New cards

Affinity grouping is useful for predicting future behavior.

false

New cards

One goal of data mining is to increase customer retention by predicting customer churn.

true

New cards

Which of the following is NOT a task of data mining?

Data warehousing

New cards

Clustering requires predefined classes to segment data.

false

New cards

Web mining can only be performed on textual data available on web pages.

false

New cards

Web structure mining focuses on discovering patterns and knowledge from the hyperlinks between web pages.

true

New cards

Web usage mining primarily deals with extracting information from hyperlinks within a web page.

false

New cards

Which of the following is not a type of Web mining?

Web Image Mining

New cards

Web content mining primarily involves:

Extracting useful information from web page contents

New cards

What is a key challenge in Web mining?

Quality control over Web data

New cards

Hyperlinks across different websites convey:

New cards

Web ________ mining extracts useful information from the main content of web pages.

content

New cards

Web usage mining involves analyzing ________ to discover user activity patterns.

New cards

Hyperlinks pointing to a page indicate its ________ because many people trust or reference it.

New cards

Data cleaning is necessary to ensure data accuracy, handle missing or incomplete information, and maintain consistency.

true

New cards

Noise in data refers to extreme values that deviate significantly from other data points.

false

New cards

Which of the following is NOT a step in the KDD process?

Data Storage

New cards

_______________ converts continuous data into discrete intervals or categories, making patterns easier to interpret.

Discretization

New cards

Which technique is used to detect redundant attributes during data integration?

Correlation Analysis

New cards

The Data Mining process encompasses all steps of the Knowledge Discovery in Database (KDD) process.

False

New cards

During data cleaning, missing values can be handled by:

All of the above.

New cards

The KDD process aims to transform raw data into _______________.

knowledge

New cards

What is the purpose of normalization during data transformation?

To scale attributes to a common range.

New cards

Principal Component Analysis (PCA) is a technique used for _______________ reduction.

Dimensionality

New cards

consists of a collection of interrelated data, known as a database

Database data

New cards

a repository of information collected from multiple sources

Data warehouse data

New cards

An entity is represented by a

data object

New cards

Typically describe data objects

attributes

New cards

Data objects stored in a database are called

data tuples

New cards

Which of the following terms are often used interchangeably (Select all that apply)

variable, attribute, feature, dimension

New cards

Ordinal attributes are symbols or names of things

false

New cards

If an attribute is not discrete, it is said to be.

continuous

New cards

Measures of central tendency refer to [ans], ____, _____

median, mean

New cards

The mean obtained after chopping off values at the high and low extremes

trimmed mean

New cards

What is the main difference between PrefixSpan and GSP?

PrefixSpan is faster because it skips the candidate generation step.

New cards

Sequential pattern mining considers the order of items in the dataset.

true

New cards

PrefixSpan generates candidates explicitly before mining the dataset.

false

New cards

What is the primary purpose of sequential pattern mining?

To discover meaningful patterns in ordered sequences

New cards

A rule with high confidence indicates that the ____________ is very likely to occur after the antecedent.

Consequent

New cards

The GSP algorithm generates all possible candidates at once, regardless of their support.

False

New cards

Which of the following is a limitation of the GSP algorithm?

It requires explicit candidate generation for each iteration.

New cards

In sequential rule generation, what metric is used to determine the strength of a rule?

Confidence

New cards

The ______________ algorithm eliminates candidate generation by recursively projecting the database.

PrefixSpan

New cards

Sequential pattern mining considers not only the items but also their ____________ in the dataset.

Order or Sequence

New cards

Classification predicts discrete and nominal values by categorizing data into distinct classes.

True

New cards

A classifier that performs extremely well on the training set will necessarily perform well on unseen test data.

false

New cards

N-fold cross validation is a technique commonly used to evaluate classifier performance on small datasets.

True

New cards

What is the primary goal of data classification?

To organize and categorize data into distinct classes

New cards

Which measure is more appropriate for evaluating classifier performance on highly imbalanced datasets?

Precision and Recall

New cards

Which cross-validation method is most commonly used when the available data is small?

n-Fold Cross Validation

New cards

The ______ attribute in a dataset serves as the target variable in classification tasks.

class

New cards

The F-score is defined as the harmonic mean of ______ and _______.

precision and recall

New cards

Which of the following are common classification methods? (Select all that apply)

K-Nearest Neighbor, Support Vector Machines, Bayesian Classification, Decision Tree Induction

New cards

Which components are part of a confusion matrix? (Select all that apply)

False Negative (FN), False Positive (FP), True Positive (TP), True Negative (TN)

New cards

Bayes’ theorem plays a critical role in [ans] learning and _______.

classification

New cards

Bayes' classifier uses the prior probability of each class given information about an item.

false

New cards

Prior knowledge can be combined with observed data

True

New cards

P(A^B)=P(A|B)P(B) is equal to P(A^B)=P(A|B)P(A)

false

New cards

Can be naturally studied from a probabilistic point of view.

Supervised learning

New cards

They are normally estimated based on observed frequencies in the training data.

Probabilities

New cards

In order to account for estimation from small samples, probability estimates are ___.

adjusted or smoothed

New cards

P(A | B) is the probability of [ans], assuming that _____ is all and only information known.

New cards

The task of _____ can be regarded as estimating the class [ans] probabilities

classification

New cards

Data mining and knowledge discovery are the same processes.

false

New cards

Web mining is limited to analyzing only web page content, excluding hyperlinks and user interactions.

false

New cards

Directed data mining involves discovering hidden patterns without a specific target.

false

New cards

Classification and prediction tasks in data mining are entirely distinct and have no similarities.

false

New cards

Data preprocessing is an optional step in the KDD process.

false

New cards

Decision trees are a type of classification method.

true

New cards

Web structure mining uses the hyperlink structure of the web to find patterns.

True

New cards

Data reduction aims to decrease the size of a dataset while retaining critical information.

true

New cards

discrete attribute has a finite or countably infinite set of values.

true

New cards

An F-score is used to measure both precision and recall in classification tasks.

true

New cards

The main goal of data mining is to uncover meaningful ______, ______, and ______.

patterns, trends, rules

New cards

___ is a measure used in association rules to determine how often items appear together.

support

New cards

Data ___ involves removing noise, filling in missing values, and resolving errors.

cleaning

New cards

In a decision tree, a ___ node represents a test on an attribute.

internal

New cards

The F-score is the harmonic mean of ___ and ___.

recall, precision

New cards

In a confusion matrix, ___ refers to correctly classified positive examples.

True Positive, TP

New cards

Sequential pattern mining considers the ___ of data points in its analysis.

order

New cards

A ___ is a visual representation that uses a box to display the five-number summary of data.

boxplot

New cards

The ___ theorem plays a critical role in Bayesian classification.

Bayes’

New cards

Web ___ mining focuses on analyzing user behavior through web logs and session data.

usage

New cards

What is the first step in the KDD process?

Understanding the Application Domain

New cards

Which method is NOT used for handling missing values in data cleaning?

Fill with random numbers

New cards

What is the main difference between classification and prediction in data mining?

Prediction deals with future outcomes

New cards

Which type of attribute has a natural zero point?

Ratio

New cards

Which classification method uses the concept of probability distributions?

Bayesian Classification

New cards

In web mining, what does PageRank algorithm primarily assess?

Page importance

New cards

Which of the following is NOT a data preprocessing stage?

Data Visualization

New cards

What is the purpose of using cross-validation in classification?

To assess the accuracy of the classifier

New cards

Which is NOT a common data mining task?

Sorting

New cards

Which technique is used in dimensionality reduction?

Principal Component Analysis (PCA)

100

New cards

Which concept is used to handle noisy data?

Binning

Explore top notes

Idiographic and Nomothetic Approaches

Updated 1133d ago

Note

Photosystems and Electron Flow

Updated 1280d ago

Note

Directional Terms, Planes, and Sections

Updated 182d ago

Note

Chapter 11 - Nelson Science 10

Updated 1033d ago

Note

Ch 4 - Organisational Structure and Design

Updated 1046d ago

Note

Invisible Man Chapter 12

Updated 1142d ago

Note

Unit 9: Cold War and Contemporary Europe

Updated 1046d ago

Note

Unit 8: 20th-Century Global Conflicts

Updated 1046d ago

Note

Idiographic and Nomothetic Approaches

Updated 1133d ago

Note

Photosystems and Electron Flow

Updated 1280d ago

Note

Directional Terms, Planes, and Sections

Updated 182d ago

Note

Chapter 11 - Nelson Science 10

Updated 1033d ago

Note

Ch 4 - Organisational Structure and Design

Updated 1046d ago

Note

Invisible Man Chapter 12

Updated 1142d ago

Note

Unit 9: Cold War and Contemporary Europe

Updated 1046d ago

Note

Unit 8: 20th-Century Global Conflicts

Updated 1046d ago

Note

Explore top flashcards

Animal Communication Final Exam

Updated 809d ago

Flashcards (74)

Kafli 3 - Reflexive Behavior and Respondent Conditioning

Flashcards (55)

Flashcards (111)

Flashcards (49)

Flashcards (34)

Y10 T3 Science - Evolution

Updated 543d ago

Flashcards (42)

Social Studies L3+l4 Quiz

Updated 990d ago

Flashcards (27)

Frans examen vocabulaire

Updated 982d ago

Flashcards (418)

Animal Communication Final Exam

Updated 809d ago

Flashcards (74)

Kafli 3 - Reflexive Behavior and Respondent Conditioning

Flashcards (55)

Flashcards (111)

Flashcards (49)

Flashcards (34)

Y10 T3 Science - Evolution

Updated 543d ago

Flashcards (42)

Social Studies L3+l4 Quiz

Updated 990d ago

Flashcards (27)

Frans examen vocabulaire

Updated 982d ago

Flashcards (418)