1/111
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the primary purpose of a data warehouse in data mining?
To organize and store historical data for analysis
Which technique is commonly used for classification tasks?
Decision trees
Which of the following best describes directed data mining?
Focusing on a specific target field or outcome
_______ is the process of segmenting a heterogeneous population into more homogeneous subgroups.
clustering
A ________ in data mining represents a set of rules or algorithms that connect inputs to a specific target outcome.
model
Data mining is the exploration and analysis of large quantities of data to discover meaningful ________ and rules.
patterns
Affinity grouping is useful for predicting future behavior.
false
One goal of data mining is to increase customer retention by predicting customer churn.
true
Which of the following is NOT a task of data mining?
Data warehousing
Clustering requires predefined classes to segment data.
false
Web mining can only be performed on textual data available on web pages.
false
Web structure mining focuses on discovering patterns and knowledge from the hyperlinks between web pages.
true
Web usage mining primarily deals with extracting information from hyperlinks within a web page.
false
Which of the following is not a type of Web mining?
Web Image Mining
Web content mining primarily involves:
Extracting useful information from web page contents
What is a key challenge in Web mining?
Quality control over Web data
Hyperlinks across different websites convey:
Web ________ mining extracts useful information from the main content of web pages.
content
Web usage mining involves analyzing ________ to discover user activity patterns.
Hyperlinks pointing to a page indicate its ________ because many people trust or reference it.
Data cleaning is necessary to ensure data accuracy, handle missing or incomplete information, and maintain consistency.
true
Noise in data refers to extreme values that deviate significantly from other data points.
false
Which of the following is NOT a step in the KDD process?
Data Storage
_______________ converts continuous data into discrete intervals or categories, making patterns easier to interpret.
Discretization
Which technique is used to detect redundant attributes during data integration?
Correlation Analysis
The Data Mining process encompasses all steps of the Knowledge Discovery in Database (KDD) process.
False
During data cleaning, missing values can be handled by:
All of the above.
The KDD process aims to transform raw data into _______________.
knowledge
What is the purpose of normalization during data transformation?
To scale attributes to a common range.
Principal Component Analysis (PCA) is a technique used for _______________ reduction.
Dimensionality
 consists of a collection of interrelated data, known as a database
Database data
a repository of information collected from multiple sources
Data warehouse data
An entity is represented by a
data object
Typically describe data objects
attributes
Data objects stored in a database are called
data tuples
Which of the following terms are often used interchangeably (Select all that apply)
variable, attribute, feature, dimension
Ordinal attributes are symbols or names of things
false
If an attribute is not discrete, it is said to be.
continuous
Measures of central tendency refer to [ans], ____, _____
median, mean
The mean obtained after chopping off values at the high and low extremes
trimmed mean
What is the main difference between PrefixSpan and GSP?
PrefixSpan is faster because it skips the candidate generation step.
Sequential pattern mining considers the order of items in the dataset.
true
PrefixSpan generates candidates explicitly before mining the dataset.
false
What is the primary purpose of sequential pattern mining?
To discover meaningful patterns in ordered sequences
A rule with high confidence indicates that the ____________ is very likely to occur after the antecedent.
Consequent
The GSP algorithm generates all possible candidates at once, regardless of their support.
False
Which of the following is a limitation of the GSP algorithm?
It requires explicit candidate generation for each iteration.
In sequential rule generation, what metric is used to determine the strength of a rule?
Confidence
The ______________ algorithm eliminates candidate generation by recursively projecting the database.
PrefixSpan
Sequential pattern mining considers not only the items but also their ____________ in the dataset.
Order or Sequence
Classification predicts discrete and nominal values by categorizing data into distinct classes.
 True
A classifier that performs extremely well on the training set will necessarily perform well on unseen test data.
false
N-fold cross validation is a technique commonly used to evaluate classifier performance on small datasets.
True
What is the primary goal of data classification?
To organize and categorize data into distinct classes
Which measure is more appropriate for evaluating classifier performance on highly imbalanced datasets?
Precision and Recall
Which cross-validation method is most commonly used when the available data is small?
n-Fold Cross Validation
The ______ attribute in a dataset serves as the target variable in classification tasks.
class
The F-score is defined as the harmonic mean of ______ and _______.
precision and recall
Which of the following are common classification methods? (Select all that apply)
K-Nearest Neighbor, Support Vector Machines, Bayesian Classification, Decision Tree Induction
Which components are part of a confusion matrix? (Select all that apply)
False Negative (FN), False Positive (FP), True Positive (TP), True Negative (TN)
Bayes’ theorem plays a critical role in [ans] learning and _______.
classification
Bayes' classifier uses the prior probability of each class given information about an item.
false
Prior knowledge can be combined with observed data
True
P(A^B)=P(A|B)P(B) is equal to P(A^B)=P(A|B)P(A)
false
Can be naturally studied from a probabilistic point of view.
Supervised learning
They are normally estimated based on observed frequencies in the training data.
Probabilities
In order to account for estimation from small samples, probability estimates are ___.
adjusted or smoothed
P(A | B) is the probability of [ans], assuming that _____ is all and only information known.
B
The task of _____ can be regarded as estimating the class [ans] probabilities
classification
Data mining and knowledge discovery are the same processes.
false
Web mining is limited to analyzing only web page content, excluding hyperlinks and user interactions.
false
Directed data mining involves discovering hidden patterns without a specific target.
false
Classification and prediction tasks in data mining are entirely distinct and have no similarities.
false
Data preprocessing is an optional step in the KDD process.
false
Decision trees are a type of classification method.
true
Web structure mining uses the hyperlink structure of the web to find patterns.
True
Data reduction aims to decrease the size of a dataset while retaining critical information.
true
discrete attribute has a finite or countably infinite set of values.
true
An F-score is used to measure both precision and recall in classification tasks.
true
The main goal of data mining is to uncover meaningful ______, ______, and ______.
patterns, trends, rules
___ is a measure used in association rules to determine how often items appear together.
support
Data ___ involves removing noise, filling in missing values, and resolving errors.
cleaning
In a decision tree, a ___ node represents a test on an attribute.
internal
The F-score is the harmonic mean of ___ and ___.
recall, precision
In a confusion matrix, ___ refers to correctly classified positive examples.
True Positive, TP
Sequential pattern mining considers the ___ of data points in its analysis.
order
A ___ is a visual representation that uses a box to display the five-number summary of data.
boxplot
The ___ theorem plays a critical role in Bayesian classification.
Bayes’
Web ___ mining focuses on analyzing user behavior through web logs and session data.
usage
What is the first step in the KDD process?
Understanding the Application Domain
Which method is NOT used for handling missing values in data cleaning?
Fill with random numbers
What is the main difference between classification and prediction in data mining?
Prediction deals with future outcomes
Which type of attribute has a natural zero point?
Ratio
Which classification method uses the concept of probability distributions?
Bayesian Classification
In web mining, what does PageRank algorithm primarily assess?
Page importance
Which of the following is NOT a data preprocessing stage?
Data Visualization
What is the purpose of using cross-validation in classification?
To assess the accuracy of the classifier
Which is NOT a common data mining task?
Sorting
Which technique is used in dimensionality reduction?
Principal Component Analysis (PCA)
Which concept is used to handle noisy data?
Binning