1/54
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is data mining?
The field that addresses the need of making sense of the staggering number of data values produced by helping us find hidden relationships, connections, or patterns.
What is the role of data mining, as depicted in a flowchart?
Unstructured/structured data → data mining → knowledge (relationships, patterns, or models)
What is the application of data mining in business intelligence?
In helping organizations to make better decisions through basket analysis, sales forecasting, and inventory planning
What is the application of data mining in retail?
In studying customer preferences, shopping patterns, and purchasing habits
What is the application of data mining in banking?
In marketing, risk management, and money laundering detection
What is the application of data mining in bioinformatics?
in protein modeling, drug discovering, and biomarker identification
What is the application of data mining in healthcare?
In understanding complex mechanisms and their interactions, identifying people who are at risk for diseases, and evaluating diagnostics.
What is the application of data mining in education?
In improving learning outcomes, curriculum improvement, and finding out reasons behind dropouts
What is the application of data mining in television and radio?
in making personalized recommendations to radio listeners and TV viewers and making informed decisions on content creation and distribution
What is the application of data mining in crime prevention?
in fraud detection and prevention, modeling criminal behavior, and predictions of future crimes
What is the application of data mining in social media analysis?
In drawing conclusions about the social media users for targeted marketing campaigns and in the study of human behavior and human interaction
What is the application of data mining in supply chain management?
In improving customer satisfaction and loyalty and making decisions regarding supplier relationship, production processes, and distribution channels
What is step 1 of the process of knowledge discovery?
Data preparation
What is step 2 of the process of knowledge discovery?
Data mining
What is step 3 of the process of knowledge discovery?
Pattern and model evaluation
What is step 4 of the process of knowledge discovery?
Knowledge representation
What happens during the data preparation step of knowledge discovery?
A process which includes aspects such as removing noise, integrating multiple data sources, transforming data into forms suited for mining, and selecting data relevant to the problem at hand
What happens during the data mining step of the process of knowledge discovery?
The actual extraction of patterns and construction of models
What happens during the pattern and model evaluation step of knowledge discovery?
The identification of patterns and models relevant to the application
What happens during the knowledge representation step of knowledge discovery?
The generation of ways to present the knowledge obtained
What is multidimensional data summarization?
Consists of obtaining a concise description of data to facilitate gaining useful information, and can be offered in the form of pie charts, bar charts, data cubes and other forms
what is mining frequent patterns, associations, and correlations?
consists of discovering relationships among the items in a given dataset
what is classification?
consists of constructing a model that describes data classes and that can be used to determine, given a new object, its class
what is cluster analysis?
consists of forming groups (or clusters) of objects based on their properties, so that objects in the same cluster have high similarity but are dissimilar to objects in other clusters
What is deep learning?
uses techniques to identify meaningful features in datasets
what is outlier analysis?
the process of identifying data objects that significantly differ from the rest of the dataset
What is descriptive data mining?
is concerned with unconvering relationships that provide insight into the underlying structure of the data.
What is predictive data mining?
is concerned with finding models that help making predictions based on known data. For example, decision trees.
What concepts are the stepstones of data mining?
data objects, data attributes, and attribute types
What are data objects?
The entities our applications deal with: persons, car models, patients, items, etc.
What are data attributes?
provide the description of objects, that is, a data field representing a feature or characteristic of the object.
What is an attribute type?
an attribute’s values. These include nominal, ordinal, interval, and ratio.
What is the nominal attribute type?
values consist of names or labels, for example, ‘single’ or ‘married’.
What is the ordinal attribute type?
values can be arranged in order, that is, they can be sorted, but differences are not meaningful. For example, values in the 5-point Likert scale, 1-5 = strongly disagree-strongly agree.
What is the interval attribute type?
values have order, differences are meaningful, and there is not a value of inherent-zero (meaning ‘none’). Here, ratios are not meaningful. For example, the years in the last 4 decades that california has had at least one big earthquake: 1984, 1992, 1994, 1999, 2010, 2019.
What is the ratio attribute type?
Similar to the interval type, with the added property that an inherent-zero is possible. Here, it is meaningful to say that a value is a multiple of another one or calculate ratios. For example, the length in bytes of documents.
What are summary statistics?
Also known as statistical parameters, these are used to describe the information given by datasets in a manner as simple as possible.
What is the measure of central tendency?
A value that represents the center of a data set, or in other words, a typical value of the data set. Among the most used are the mean, the median, the mode, and the midrange.
What is measure of dispersion?
A value that helps us understand the variability or spread of the data. Among the most used are the range, the variance, and the standard deviation
What is variance?
Variance is the average squared deviations from the mean.
What is standard deviation?
The square root of the variance
What are proximity measures?
Used to be able to determine how similar or dissimilar two objects are. Examples are the Minkowski sum and the cosine similarity.
What is normalization?
A method to attempt to give different attributes an equal weight. This is used to make it easier for patterns to be understood from data. Examples are Min-Max normalization, decimal scaling, and z-score.