1/64
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data mining
is the process of discovering patterns, relationships, and insights from large datasets.
Data mining
generally refers to the transformation of data into meaningful information for evidence based decision making.
Data Mining Techniques
Classification and class probability estimation, Regression, Similarity matching, Clustering, Co-occurrence grouping, Profiling, Link Prediction, Data Reduction, Causal modeling
Classification and class probability estimation
attempt to predict, for each individual in a population, which of a (small) set of classes this individual belongs to
Regression (“Value Estimation”)
attempts to estimate or predict, for each individual, the numerical value of some variable for that individual
Similarity matching
attempts to identify similar individuals based on data known about them
Clustering
attempts to group individuals in a population together by their similarity, but not driven by any specific purpose
Co-occurrence grouping
also known as frequent itemset mining, association rule discovery, and market basket analysis
Co-occurrence grouping
attempts to find associations between entities based on transactions involving them
Profiling
also known as behavior description
Profiling
attempts to characterize the typical behavior of an individual, group, or population.
Link Prediction
attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link
Link Prediction
is common in social networking system
Data reduction
attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set
Causal modeling
attempts to help us understand what events or actions actually influence others
Business Understanding Stage
represents a part of the craft where the analysts’ creativity plays a large role
Business Understanding Stage
In this stage, the design team should think carefully about the use scenario.
Data Understanding
It is important to understand the strengths and limitations of the data. Check if the data is appropriate for the goals established and check data taken from different sources have the same format.
costs and benefits
A critical part of the data understanding phase is estimating the __ of each data source and deciding whether further investment is merited
Data Preparation
Involves data cleaning, check for outliers
Data Preparation
Typical examples of ___ are converting data to tabular format, removing or inferring missing values, and converting data to different types
Modeling
Determine mathematical models to establish patterns.
Evaluation
Determine if results of analysis are aligned with objectives
quantitative and qualitative assessments
Evaluating the results of data mining includes both ___
Deployment
Communicate the discoveries in a timely well written report to serve as input for decision making in the business operation
Deployment
___can also be much more subtle, such as a change to data acquisition procedures, or a change to strategy, marketing, or operations resulting from insight gained from mining the data.
Primary source of data
Surveys, Interviews, Focus Discussion, Scientific Simulations, Social Experiments, Scientific Experiments
Secondary source of data
books, journal articles, research studies (even unpublished), verifiable news clippings (be careful with fake news), published corporate reports, laws, ordinances, government memos, etc.
Nominal, Ordinal, Scalar
Types of Data or Variables in Surveys
Nominal
Do not have any quantitative values.These data cannot be ordered or cannot be measured.
Nominal
[Types of Data] Examples: Sex Male & Female; Marital Status Single, Married, Widower
Ordinal
Have natural ordered categories, and the order between them cannot be determined.
Ordinal
[Type of Data] Examples: Ranking, Likert scale (1 Strongly Disagree to 5 Strongly Agree
Scalar
continuous physical attribute or quantity that can be measured.
Scalar
[Types of Data] Examples Age, weight, height, time
Qualitative, Quantitative, Mixed
Types of Analysis Research Methods
Quantitative
gathering, manipulation and interpretation of data taken from surveys or other secondary sources (financial reports).
Quantitative
it involves, financial and statistical analysis, pattern and trending recognition, forecasting, etc.
Qualitative
gathering, curating, and interpreting information taken from interviews, focus group discussions, and social experiments.
Qualitative
It deals with understanding the meaning of concepts and factors affecting human behavior
Mixed Methods
Mixed of both the qualitative and quantitative methods in order to verify, validate, and triangulate the data gathered.
Sequential, Parallel
Types of mixed method
Sequential
One method is done one after the other:
1) Qualitative Interviews Pre test the questionnaire to validate the coherence of the questions in the survey);
2) Quantitative analysis of survey data;
3) Qualitative KII and FGD to get the story behind the numbers or results of the survey
Parallel
Qualitative and Quantitative data gatherings and analysis are done independently.
Non-parametric, Parametric
Types of Statistical Tests
Non-parametric
These are often used for nominal and ordinal data types. Most commonly used test is the Associative Analysis Cross Tabulation Chi square tests
Parametric
These are used for scalar type data. Commonly used tests are Regression Analysis, Analysis of Variance, etc.
Customer Relationship Management, Market Basket Analysis, Fraud Detection, Risk Analysis and Management, Demand Forecasting, Healthcare and Medicine, Recommender Systems, Text and Sentiment Analysis, Supply Chain Optimization, Quality Control and Manufacturing
Application of Data Mining
Customer Relationship Management (CRM)
Data mining helps analyze customer behavior, preferences, and purchasing patterns, allowing businesses to develop targeted marketing campaigns, improve customer retention strategies, and enhance overall customer satisfaction.
Market Basket Analysis
Data mining techniques like association rule mining are used to analyze transactional data and identify relationships between products that are frequently purchased together. This information enables businesses to optimize product placement, cross selling, and upselling strategies.
Fraud Detection
Data mining can help identify patterns and anomalies in large datasets, making it valuable for fraud detection in areas such as credit card transactions, insurance claims, healthcare billing, and online security. Unusual patterns or suspicious activities can be flagged for further investigation.
Risk Analysis and Management
Data mining is utilized to assess and mitigate risks in various sectors, including finance, insurance, and project management. It helps identify potential risks, predict future outcomes, and develop strategies to minimize risks and maximize opportunities.
Demand Forecasting
Data mining techniques, such as regression analysis and time series analysis, enable businesses to predict future demand for products or services. This information aids in inventory management, supply chain optimization, and production planning.
Healthcare and Medicine
Data mining plays a crucial role in medical research, clinical decision making, disease diagnosis, and patient monitoring. It helps identify patterns in patient data, discover associations between symptoms and diseases, and develop predictive models for early detection and treatment planning.
Recommender Systems
Data mining is utilized in recommendation engines that provide personalized suggestions to users.This is prevalent in ecommerce, streaming platforms, and social media, where algorithms analyze user behavior and preferences to offer relevant product recommendations or content suggestions.
Text and Sentiment Analysis
Data mining techniques can be applied to analyze unstructured textual data, such as social media posts, customer reviews, or surveys. It helps extract insights, sentiment analysis, topic modeling, and understanding customer feedback or public opinions.
Supply Chain Optimization
Data mining assists in optimizing supply chain operations by analyzing data related to inventory levels, supplier performance, transportation routes, and demand patterns. This helps businesses make informed decisions, reduce costs, and improve efficiency.
Quality Control and Manufacturing
Data mining techniques are used to analyze sensor data, production metrics, and quality control records to detect patterns, identify factors impacting product quality, and optimize manufacturing processes to reduce defects and improve overall quality.
Data Encoding
Refers to the process of transforming data into a standardized format or structure that can be easily analyzed and utilized for marketing purposes.
Data Encoding
It involves converting various types of marketing data, such as customer information, campaign data, or product data, into a consistent representation.
One-Hot Encoding, Label Encoding, Binary Encoding
Types of Data Encoding
One-Hot Encoding
is a commonly used technique for encoding categorical data. Each category is represented as a binary vector where only one bit is set to 1 and the rest are set to 0
Binary Encoding
Binary encoding is similar to one
hot encoding, but
it uses a binary representation of the numerical
value of each category instead of a vector of
binary valuesLabel Encoding
Each category is assigned a numerical value, usually starting from 0 or 1
Binary Encoding
is similar to one-hot encoding, but it uses a binary representation of the numerical value of each category instead of a vector of binary values