DM01: Data Mining

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/64

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

65 Terms

1
New cards

Data mining

is the process of discovering patterns, relationships, and insights from large datasets.

2
New cards

Data mining

generally refers to the transformation of data into meaningful information for evidence based decision making.

3
New cards

Data Mining Techniques

Classification and class probability estimation, Regression, Similarity matching, Clustering, Co-occurrence grouping, Profiling, Link Prediction, Data Reduction, Causal modeling

4
New cards

Classification and class probability estimation

attempt to predict, for each individual in a population, which of a (small) set of classes this individual belongs to

5
New cards

Regression (“Value Estimation”)

attempts to estimate or predict, for each individual, the numerical value of some variable for that individual

6
New cards

Similarity matching

attempts to identify similar individuals based on data known about them

7
New cards

Clustering

attempts to group individuals in a population together by their similarity, but not driven by any specific purpose

8
New cards

Co-occurrence grouping

also known as frequent itemset mining, association rule discovery, and market basket analysis

9
New cards

Co-occurrence grouping

attempts to find associations between entities based on transactions involving them

10
New cards

Profiling

also known as behavior description

11
New cards

Profiling

attempts to characterize the typical behavior of an individual, group, or population.

12
New cards

Link Prediction

attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link

13
New cards

Link Prediction

is common in social networking system

14
New cards

Data reduction

attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set

15
New cards

Causal modeling

attempts to help us understand what events or actions actually influence others

16
New cards

Business Understanding Stage

represents a part of the craft where the analysts’ creativity plays a large role

17
New cards

Business Understanding Stage

In this stage, the design team should think carefully about the use scenario.

18
New cards

Data Understanding

It is important to understand the strengths and limitations of the data. Check if the data is appropriate for the goals established and check data taken from different sources have the same format.

19
New cards

costs and benefits

A critical part of the data understanding phase is estimating the __ of each data source and deciding whether further investment is merited

20
New cards

Data Preparation

Involves data cleaning, check for outliers

21
New cards

Data Preparation

Typical examples of ___ are converting data to tabular format, removing or inferring missing values, and converting data to different types

22
New cards

Modeling

Determine mathematical models to establish patterns.

23
New cards

Evaluation

Determine if results of analysis are aligned with objectives

24
New cards

quantitative and qualitative assessments

Evaluating the results of data mining includes both ___

25
New cards

Deployment

Communicate the discoveries in a timely well written report to serve as input for decision making in the business operation

26
New cards

Deployment

___can also be much more subtle, such as a change to data acquisition procedures, or a change to strategy, marketing, or operations resulting from insight gained from mining the data.

27
New cards

Primary source of data

Surveys, Interviews, Focus Discussion, Scientific Simulations, Social Experiments, Scientific Experiments

28
New cards

Secondary source of data

books, journal articles, research studies (even unpublished), verifiable news clippings (be careful with fake news), published corporate reports, laws, ordinances, government memos, etc.

29
New cards

Nominal, Ordinal, Scalar

Types of Data or Variables in Surveys

30
New cards

Nominal

Do not have any quantitative values.These data cannot be ordered or cannot be measured.

31
New cards

Nominal

[Types of Data] Examples: Sex Male & Female; Marital Status Single, Married, Widower

32
New cards

Ordinal

Have natural ordered categories, and the order between them cannot be determined.

33
New cards

Ordinal

[Type of Data] Examples: Ranking, Likert scale (1 Strongly Disagree to 5 Strongly Agree

34
New cards

Scalar

continuous physical attribute or quantity that can be measured.

35
New cards

Scalar

[Types of Data] Examples Age, weight, height, time

36
New cards

Qualitative, Quantitative, Mixed

Types of Analysis Research Methods

37
New cards

Quantitative

gathering, manipulation and interpretation of data taken from surveys or other secondary sources (financial reports).

38
New cards

Quantitative

it involves, financial and statistical analysis, pattern and trending recognition, forecasting, etc.

39
New cards

Qualitative

gathering, curating, and interpreting information taken from interviews, focus group discussions, and social experiments.

40
New cards

Qualitative

It deals with understanding the meaning of concepts and factors affecting human behavior

41
New cards

Mixed Methods

Mixed of both the qualitative and quantitative methods in order to verify, validate, and triangulate the data gathered.

42
New cards

Sequential, Parallel

Types of mixed method

43
New cards

Sequential

One method is done one after the other:

1) Qualitative Interviews Pre test the questionnaire to validate the coherence of the questions in the survey);

2) Quantitative analysis of survey data;

3) Qualitative KII and FGD to get the story behind the numbers or results of the survey

44
New cards

Parallel

Qualitative and Quantitative data gatherings and analysis are done independently.

45
New cards

Non-parametric, Parametric

Types of Statistical Tests

46
New cards

Non-parametric

These are often used for nominal and ordinal data types. Most commonly used test is the Associative Analysis Cross Tabulation Chi square tests

47
New cards

Parametric

These are used for scalar type data. Commonly used tests are Regression Analysis, Analysis of Variance, etc.

48
New cards

Customer Relationship Management, Market Basket Analysis, Fraud Detection, Risk Analysis and Management, Demand Forecasting, Healthcare and Medicine, Recommender Systems, Text and Sentiment Analysis, Supply Chain Optimization, Quality Control and Manufacturing

Application of Data Mining

49
New cards

Customer Relationship Management (CRM)

Data mining helps analyze customer behavior, preferences, and purchasing patterns, allowing businesses to develop targeted marketing campaigns, improve customer retention strategies, and enhance overall customer satisfaction.

50
New cards

Market Basket Analysis

Data mining techniques like association rule mining are used to analyze transactional data and identify relationships between products that are frequently purchased together. This information enables businesses to optimize product placement, cross selling, and upselling strategies.

51
New cards

Fraud Detection

Data mining can help identify patterns and anomalies in large datasets, making it valuable for fraud detection in areas such as credit card transactions, insurance claims, healthcare billing, and online security. Unusual patterns or suspicious activities can be flagged for further investigation.

52
New cards

Risk Analysis and Management

Data mining is utilized to assess and mitigate risks in various sectors, including finance, insurance, and project management. It helps identify potential risks, predict future outcomes, and develop strategies to minimize risks and maximize opportunities.

53
New cards

Demand Forecasting

Data mining techniques, such as regression analysis and time series analysis, enable businesses to predict future demand for products or services. This information aids in inventory management, supply chain optimization, and production planning.

54
New cards

Healthcare and Medicine

Data mining plays a crucial role in medical research, clinical decision making, disease diagnosis, and patient monitoring. It helps identify patterns in patient data, discover associations between symptoms and diseases, and develop predictive models for early detection and treatment planning.

55
New cards

Recommender Systems

Data mining is utilized in recommendation engines that provide personalized suggestions to users.This is prevalent in ecommerce, streaming platforms, and social media, where algorithms analyze user behavior and preferences to offer relevant product recommendations or content suggestions.

56
New cards

Text and Sentiment Analysis

Data mining techniques can be applied to analyze unstructured textual data, such as social media posts, customer reviews, or surveys. It helps extract insights, sentiment analysis, topic modeling, and understanding customer feedback or public opinions.

57
New cards

Supply Chain Optimization

Data mining assists in optimizing supply chain operations by analyzing data related to inventory levels, supplier performance, transportation routes, and demand patterns. This helps businesses make informed decisions, reduce costs, and improve efficiency.

58
New cards

Quality Control and Manufacturing

Data mining techniques are used to analyze sensor data, production metrics, and quality control records to detect patterns, identify factors impacting product quality, and optimize manufacturing processes to reduce defects and improve overall quality.

59
New cards

Data Encoding

Refers to the process of transforming data into a standardized format or structure that can be easily analyzed and utilized for marketing purposes.

60
New cards

Data Encoding

It involves converting various types of marketing data, such as customer information, campaign data, or product data, into a consistent representation.

61
New cards

One-Hot Encoding, Label Encoding, Binary Encoding

Types of Data Encoding

62
New cards

One-Hot Encoding

is a commonly used technique for encoding categorical data. Each category is represented as a binary vector where only one bit is set to 1 and the rest are set to 0

63
New cards

Binary Encoding

Binary encoding is similar to one

hot encoding, but

it uses a binary representation of the numerical

value of each category instead of a vector of

binary valuesLabel Encoding

Each category is assigned a numerical value, usually starting from 0 or 1

64
New cards

Binary Encoding

is similar to one-hot encoding, but it uses a binary representation of the numerical value of each category instead of a vector of binary values

65
New cards