Data Mining Flashcards

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/32

flashcard set

Earn XP

Description and Tags

Flashcards covering the key concepts of Data Mining.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

What is the purpose of data mining?

To predict and deliver prospective and proactive information by data mining our big data.

2
New cards

What type of data represents more than 75% of an organization's data?

Unstructured data, residing in text files.

3
New cards

What is the definition of data mining?

An iterative process that explores and models big data to identify patterns and provide meaningful insights. Also, the process of using software to sort through data to discover patterns and ascertain or establish relationships.

4
New cards

What are three ways data mining projects can be focused?

Predictive, exploratory, or focused on data reduction.

5
New cards

What are the four phases of data mining?

(1) problem identification, (2) exploration of the data, (3) pattern discovery, and (4) knowledge deployment.

6
New cards

What is the ultimate goal of data mining?

Forecasting or prediction.

7
New cards

In data mining, what is 'scoring'?

An extension where a model uses an algorithm on datasets for one situation where the outcome is known and then applies this same model to another situation where the outcome is not known.

8
New cards

What other terms are used for data mining?

Knowledge discovery and data mining (KDD), knowledge discovery in data, and knowledge discovery in databases.

9
New cards

What is the purpose of 'knowledge discovery' in data mining?

To look at data from different vantage points, aspects, and perspectives and brings new insights to the dataset.

10
New cards

What are the steps required to transform Raw Data into Knowledge?

Raw Data -> Preprocessed data -> Transformed data -> Patterns -> Knowledge

11
New cards

What are some benefits of KDD in healthcare?

Improve healthcare policy making, healthcare practices, disease prevention, detection of disease outbreaks, prevention of sequelae, and prevention of in-hospital deaths.

12
New cards

What is bagging in data mining?

The use of voting and averaging in predictive data mining to synthesize the predictions from many models or methods or the use of the same type of a model on different data.

13
New cards

What is boosting in data mining?

A means of increasing the power of the models generated by weighting the combinations of predictions from those models into a predicted classification.

14
New cards

What is the purpose of data reduction?

To shrink large datasets into manageable, smaller datasets via aggregation of the data or clustering.

15
New cards

What is drill down analysis?

Begins by identifying variables of interest to drill down into the data to expose even more of the data.

16
New cards

What is Exploratory Data Analysis (EDA)?

An approach or philosophy that uses mainly graphical techniques to gain insight into a dataset.

17
New cards

What is feature selection?

Reduces inputs to a manageable size for processing and analysis, as the model either chooses or rejects an attribute based on its usefulness for analysis.

18
New cards

What is predictive data mining?

Identifies the data mining project as one with the goal of identifying a model that can predict classifications.

19
New cards

What are the functions of the algorithms in data mining?

(1) they explain how to separate or partition the data at each split, (2) they decree when to stop or end the splitting of data, and (3) they determine how to predict the value of y for each x in a split.

20
New cards

What is rule induction based on?

Statistical significance using if-then statements.

21
New cards

What is nearest neighbor analysis?

Classifies each record in a dataset based on a select number of its nearest neighbors, also known as the k-nearest neighbor.

22
New cards

What is text mining?

Equivalent to data mining for numerical data, analyzes text documents by extracting key words or phrases.

23
New cards

What is Online Analytic Processing (OLAP)?

Generates different views of the data in multidimensional databases, also known as fast analysis of shared data.

24
New cards

What is brushing in data mining?

A technique in which the user manually chooses specific data points or observations or subsets of data on an interactive data display, also known as graphical exploratory data analysis.

25
New cards

What does a data mining model consist of?

A mining structure plus an algorithm.

26
New cards

What is a data mining model developed by?

Exercising one or more algorithms on data.

27
New cards

What are some data mining models?

Cross-Industry Standard Process for Data Mining (CRISP -DM), Six Sigma/Lean, and SEMMA (Sample, Explore, Modify, Model, Assess).

28
New cards

What are the six steps of the CRISP-DM model?

Business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

29
New cards

What steps does Six Sigma use?

DMAIC: define, measure, analyze, improve, and control.

30
New cards

What does SEMMA stand for?

Sample, Explore, Modify, Model, Assess.

31
New cards

What are registries designed to provide?

Lists of subpopulations, Identify patients with care gaps, Support outreach to patients who have care gaps, Provide feedback on how each physician is doing on particular types of care, Generate quality reports for the practice

32
New cards

What is process mining?

The application of process mining in healthcare allows health experts to understand the actual execution of processes: discovering process models, checking conformance with medical guidelines, and finding improvement opportunities.

33
New cards

What must data mining practitioners ensure when using private health information (PHI)?

That such data are deidentified and that confidentiality is maintained.