Looks like no one added any tags here yet for you.
Data mining
is the process of discovering patterns, relationships, and insights from large datasets.
Data mining
generally refers to the transformation of data into meaningful information for evidence based decision making.
Data Mining Techniques
Classification and class probability estimation, Regression, Similarity matching, Clustering, Co-occurrence grouping, Profiling, Link Prediction, Data Reduction, Causal modeling
Classification and class probability estimation
attempt to predict, for each individual in a population, which of a (small) set of classes this individual belongs to
Regression (“Value Estimation”)
attempts to estimate or predict, for each individual, the numerical value of some variable for that individual
Similarity matching
attempts to identify similar individuals based on data known about them
Clustering
attempts to group individuals in a population together by their similarity, but not driven by any specific purpose
Co-occurrence grouping
also known as frequent itemset mining, association rule discovery, and market basket analysis
Co-occurrence grouping
attempts to find associations between entities based on transactions involving them
Profiling
also known as behavior description
Profiling
attempts to characterize the typical behavior of an individual, group, or population.
Link Prediction
attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link
Link Prediction
is common in social networking system
Data reduction
attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set
Causal modeling
attempts to help us understand what events or actions actually influence others
Business Understanding Stage
represents a part of the craft where the analysts’ creativity plays a large role
Business Understanding Stage
In this stage, the design team should think carefully about the use scenario.
Data Understanding
It is important to understand the strengths and limitations of the data. Check if the data is appropriate for the goals established and check data taken from different sources have the same format.
costs and benefits
A critical part of the data understanding phase is estimating the __ of each data source and deciding whether further investment is merited
Data Preparation
Involves data cleaning, check for outliers
Data Preparation
Typical examples of ___ are converting data to tabular format, removing or inferring missing values, and converting data to different types
Modeling
Determine mathematical models to establish patterns.
Evaluation
Determine if results of analysis are aligned with objectives
quantitative and qualitative assessments
Evaluating the results of data mining includes both ___
Deployment
Communicate the discoveries in a timely well written report to serve as input for decision making in the business operation
Deployment
___can also be much more subtle, such as a change to data acquisition procedures, or a change to strategy, marketing, or operations resulting from insight gained from mining the data.
Primary source of data
Surveys, Interviews, Focus Discussion, Scientific Simulations, Social Experiments, Scientific Experiments
Secondary source of data
books, journal articles, research studies (even unpublished), verifiable news clippings (be careful with fake news), published corporate reports, laws, ordinances, government memos, etc.
Nominal, Ordinal, Scalar
Types of Data or Variables in Surveys
Nominal
Do not have any quantitative values.These data cannot be ordered or cannot be measured.
Nominal
[Types of Data] Examples: Sex Male & Female; Marital Status Single, Married, Widower
Ordinal
Have natural ordered categories, and the order between them cannot be determined.
Ordinal
[Type of Data] Examples: Ranking, Likert scale (1 Strongly Disagree to 5 Strongly Agree
Scalar
continuous physical attribute or quantity that can be measured.
Scalar
[Types of Data] Examples Age, weight, height, time
Qualitative, Quantitative, Mixed
Types of Analysis Research Methods
Quantitative
gathering, manipulation and interpretation of data taken from surveys or other secondary sources (financial reports).
Quantitative
it involves, financial and statistical analysis, pattern and trending recognition, forecasting, etc.
Qualitative
gathering, curating, and interpreting information taken from interviews, focus group discussions, and social experiments.
Qualitative
It deals with understanding the meaning of concepts and factors affecting human behavior
Mixed Methods
Mixed of both the qualitative and quantitative methods in order to verify, validate, and triangulate the data gathered.
Sequential, Parallel
Types of mixed method
Sequential
One method is done one after the other:
1) Qualitative Interviews Pre test the questionnaire to validate the coherence of the questions in the survey);
2) Quantitative analysis of survey data;
3) Qualitative KII and FGD to get the story behind the numbers or results of the survey
Parallel
Qualitative and Quantitative data gatherings and analysis are done independently.