Business Intelligence Notes
Data mining is also known as knowledge discovery from data, emphasizing its goal of uncovering valuable information. Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. It is an interdisciplinary subfield of computer science with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use. Aside from the raw data transformation step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structure, visualization, and online updating. Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. Data mining is the interdisciplinary subfield of computer science with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. It is a process of discovering interesting and useful knowledge from large amounts of data.
It involves extracting interesting, non-trivial, implicit, previously unknown, and potentially useful patterns or knowledge from huge amounts of data, highlighting the non-obvious nature of the insights gained. Examples include finding customer purchasing behaviors, predicting equipment failures, or detecting fraudulent transactions.
Alternative names for data mining include:-
Knowledge discovery (mining) in databases (KDD): Emphasizes the broader process that includes data mining as a key step.
Knowledge extraction: Focuses on the extraction of knowledge from data.
Data/pattern analysis: Highlights the analytical aspect of data mining.
Data archeology: Suggests the unearthing of hidden information.
Data dredging: Sometimes used pejoratively to describe haphazard data analysis.
Information harvesting: Implies gathering valuable information from data.
Business intelligence: Focuses on using data mining for business decision-making.
Potential Applications
Data analysis and decision support:-
Market analysis and management: Understanding market trends and customer behavior.
Target marketing: Identifying specific customer segments for marketing campaigns.
Refining target audiences for increased campaign effectiveness.
Optimizing marketing spend via focused initiatives and tailored messaging.
Customer relationship management (CRM): Managing interactions with current and potential customers.
Personalizing customer interactions to foster stronger relationships.
Identifying upselling and cross-selling opportunities to maximize revenue.
Managing customer data to enhance service and support experiences.
Market basket analysis: Discovering associations between products purchased together.
Refining product placement strategies for increased sales.
Developing bundled products and services based on common purchase patterns.
Personalizing promotional offers based on frequently co-purchased items.
Cross-selling: Promoting related products to existing customers.
Market segmentation: Dividing a market into distinct groups of buyers.
Risk analysis and management: Assessing and mitigating risks.
Credit risk assessment for loan approvals.
Fraud detection in financial transactions.
Insurance claims analysis to identify potentially fraudulent activities.
Forecasting: Predicting future trends and outcomes.
Sales forecasting to optimize inventory levels.
Predicting customer churn to implement retention strategies.
Forecasting financial performance for better resource allocation.
Customer retention: Strategies to keep customers.
Improved underwriting: More accurate insurance assessments.
Quality control: Monitoring and improving product quality.
Competitive analysis: Evaluating competitors' strengths and weaknesses.
Fraud detection and detection of unusual patterns (outliers): Identifying anomalies in data.
Detecting fraudulent transactions in real-time.
Identifying unusual network activity to prevent cyberattacks.
Monitoring patient data to detect health anomalies early.
Other Applications-
Text mining (news group, email, documents) and Web mining: Extracting information from text and web content.
Sentiment analysis of customer reviews to gauge product satisfaction.
Topic extraction from news articles to identify emerging trends.
Social media monitoring to understand public opinion on brands or products.
Stream data mining: Analyzing continuous data streams in real-time.
Market Analysis and Management Examples
Target marketing:-
Finding clusters of "model" customers who share the same characteristics (interest, income level, spending habits, etc.).
Example: Most customers with an income level between and food expenses between a month live in a particular area.
Determining customer purchasing patterns over time.
Example: Customers who are between 20 and 29 years old, with an income of usually buy a specific type of CD player.
Cross-market analysis:-
Finding associations/co-relations between product sales and predict based on such associations.
Market Analysis and Management (2)
Customer requirement analysis:-
Identifying the best products for different customers.
Predicting what factors will attract new customers.
Provision of summary information:-
Multidimensional summary reports.-
Example: Summarize all transactions of the first quarter from three different branches.
Summarize all transactions of last year from a particular branch.
Summarize all transactions of a particular product.
Statistical summary information.-
Example: What is the average age for customers who buy product A?
Fraud detection:-
Finding outliers of unusual transactions.
Financial planning
Steps to Perform Apriori Algorithm
STEP 1
Scan the transaction database to get the support of each 1-itemset, compare with
min_sup, and get the set of frequent 1-itemsets, . Apriori algorithm assumes that all subsets of a frequent itemset must be frequent. Also, Apriori Algorithm says that, if an itemset is infrequent, all its supersets will be infrequent.
STEP 2
Use join to generate a set of candidate k-itemsets. Apply the Apriori property to prune the infrequent k-itemsets from this set. This step generates new candidate k-itemsets based on the frequent (k-1)-itemsets found in the previous step.
STEP 3
Scan the transaction database to get the support of each candidate k-itemset in the find set, compare with
min_sup, and get a set of frequent k-itemsets L. The support count of each candidate k-itemset is calculated by scanning the transaction database.
STEP 4
If the candidate set = Null, proceed to STEP 6; otherwise, continue to STEP 5. This condition checks whether any candidate itemsets were generated in Step 2. If not, it implies that no more frequent itemsets exist, and the algorithm proceeds to Step 6 to generate association rules.
STEP 5
For each frequent itemset 1, generate all non-empty subsets of 1. This step is performed to generate association rules from the frequent itemsets identified in the previous steps.
STEP 6
For every non-empty subset s of 1, output the rule "s(1-s)" if the confidence C of the rule "s(1-s)" (support of 1 / support of s) is greater than or equal to
min_conf. This step uses the frequent itemsets and their subsets to generate association rules.
Apriori Algorithm Example
Database D
Minsup = 0.5
Transactions and items are analyzed to find frequent itemsets.
Citemset represents candidate itemsets.
Litemset represents frequent itemsets (those meeting the minimum support threshold).
Knowledge Discovery (KDD) Process
Data cleaning and integration are initial steps. It involves handling missing values, removing noise, and resolving inconsistencies.
Data selection and transformation are performed. Data relevant to the analysis is selected, and it is transformed into a suitable format for mining.
Data mining extracts patterns. Algorithms are applied to discover patterns and relationships in the data.
Evaluation and presentation of knowledge occur. The discovered patterns are evaluated, and the knowledge is presented to the user in a comprehensible form.
KDD Process: Several Key Steps
Learning the application domain:-
Involves relevant prior knowledge and goals of the application.
Identifying a target data set: Data selection.
Data processing:-
Data cleaning: Remove noise and inconsistent data.
Techniques for data cleaning include handling missing values, smoothing noisy data, and resolving inconsistencies.
Data integration: Multiple data sources may be combined.
Data integration involves combining data from multiple sources into a unified view.
Challenges in data integration include schema matching, resolving semantic heterogeneities, and handling data redundancy.
Data selection: Data relevant to the analysis task is retrieved from the database.
Data transformation: Data is transformed or consolidated into forms appropriate for mining (Done with data preprocessing).
Data transformation includes normalization, aggregation, and generalization.
Normalization scales the data to a specific range, aggregation summarizes the data, and generalization replaces low-level data with higher-level concepts.
Data mining:-
An essential process where intelligent methods are applied to extract data patterns.
Pattern evaluation:-
Identify the truly interesting patterns.
Knowledge presentation:-
Mined knowledge is presented to the user.
Data Mining and Business Intelligence
Increasing potential to support business decisions.
Involves various roles:-
End User
Business Analyst
Data Analyst
DBA
Progression from Data Sources to Decision Making:-
Data Sources (Paper, Files, Web documents, Scientific experiments, Database Systems)
Data Preprocessing/Integration, Data Warehouses
Statistical Summary, Querying, and Reporting
Data Exploration
Data Mining
Information Discovery
Data Presentation
Visualization Techniques
Decision Making
A Typical DM System Architecture
Components:-
Database, data warehouse, WWW, or other information repository (stores data).
Database or data warehouse server (fetches and combines data).
Knowledge base (turns data into meaningful groups according to domain knowledge).
Data mining engine (performs mining tasks).
Pattern evaluation module (finds interesting patterns).
User interface (visualizes information).
A Typical DM System Architecture (2)
Includes user interface, pattern evaluation, data mining engine, knowledge base, database or data warehouse server, and data sources (database, data warehouse, World Wide Web, other info repositories).
Also incorporates data cleaning, integration, and selection processes.
Confluence of Multiple Disciplines in Data Mining
Data Mining combines:-
Database Technology
Statistics
Machine Learning
Information Science
Visualization
Other Disciplines
Not all "Data Mining Systems" perform true data mining:-
Machine learning systems and statistical analysis may work with small amounts of data.
Database systems focus on information retrieval or deductive querying.
On What Kinds of Data?
Database-oriented data sets and applications:-
Relational database, data warehouse, transactional database
Advanced data sets and advanced applications:-
Object-Relational Databases
Temporal Databases, Sequence Databases, Time-Series databases
Spatial Databases and Spatiotemporal Databases
Text databases and Multimedia databases
Heterogeneous Databases and Legacy Databases
Data Streams
The World-Wide Web
Data Mining Functionalities - What kinds of patterns can be mined?
Data discrimination:-
comparing the target class with one or a set of comparative classes
Example: Compare the general features of software products whose sales increase by 10% in the last year with those whose sales decrease by 30% during the same period, or both.
Mining Frequent Patterns, Associations, and Correlations:-
Frequent itemset: a set of items that frequently appear together in a transactional data set (e.g., milk and bread).
Frequent subsequence: a pattern that customers tend to purchase product A, followed by a purchase of product B.
Data Mining Functionalities - What kinds of patterns can be mined? (continued)
Association Analysis:-
Find frequent patterns.
Example: buys(X, “computer”) => buys(X, “software”) [support = 1%, confidence = 50%]-
If a customer buys a computer, there is a 50% chance that she will buy software. 1% of all transactions analyzed show that computer and software are purchased together.
Association rules are discarded as uninteresting if they do not satisfy both a minimum support threshold and a minimum confidence threshold.
Correlation Analysis:-
Additional analysis to find statistical correlations between associated pairs.
Data Mining Functionalities - What kinds of patterns can be mined? (continued)
Classification and Prediction-
Classification
The process of finding a model that describes and distinguishes the data classes or concepts for the purpose of predicting the class of objects with unknown class labels.
The derived model is based on the analysis of a set of training data (data objects whose class label is known).
The model can be represented in classification (IF-THEN) rules, decision trees, neural networks, etc.
Prediction
Predict missing or unavailable numerical data values.
Data Mining Applications
Computer hardware and software
Science and engineering
Government and defense
Travel industry
Health care
Medicine
Sports
Data Mining Software
Commercial:-
IBM SPSS Modeler (formerly Clementine)
SAS - Enterprise Miner
IBM - Intelligent Miner
StatSoft - Statistica Data Miner
Oracle DM
Free and/or Open Source:-
RapidMiner
Weka
MATLAB
KUME
Data Mining Myths
Data mining provides instant solutions/predictions.
Data mining is not yet viable for business applications.
Data mining requires a separate, dedicated database.
Data mining can only be done by those with advanced degrees.
Data mining is only for large firms that have lots of customer data.
Data mining is another name for good-old statistics.
Common Data Mining Blunders
Selecting the wrong problem for data mining.
Ignoring what your sponsor thinks data mining is and what it really can/cannot do.
Not leaving sufficient time for data acquisition, selection, and preparation.
Looking only at aggregated results and not at individual records/predictions.
Being sloppy about keeping track of the data mining procedure and results.
Classification with Decision Trees
Classification is the process of learning a model that describes different classes of data. The classes are predetermined.
Example: In a banking application, customers who apply for a credit card may be classified as a "good risk", a "fair risk," or a "poor risk."
This type of activity is also called supervised learning.
Once the model is built, it can be used to classify new data.
Classification with Decision Trees (continued)
Learning the model is accomplished by using a training set of data that has already been classified.
Each record in the training data contains a class label that indicates which class the record belongs to.
The model produced is usually in the form of a decision tree or a set of rules.
Important issues with regard to the model and the algorithm include:-
The model's ability to predict the correct class of new data.
The computational cost associated with the algorithm.
The scalability of the algorithm.
A decision tree is simply a graphical representation of the description of each class or a representation of the classification rules.
Decision Tree Example
A database of customers on the AllElectronics mailing list includes attributes such as name, age, income, occupation, and credit rating.
Customers are classified as to whether or not they have purchased a computer at AllElectronics.
New customers are added to the database, and the goal is to notify those new customers who are likely to purchase a new computer of an upcoming computer sale.
Targeting only those new customers who are likely to purchase a new computer is a more cost-efficient method.
A classification model can be constructed and used for this purpose.
Figure 2 shows a decision tree for the concept buys_computer, indicating whether or not a customer at AllElectronics is likely to purchase a computer.
Decision Tree Structure
Each internal node represents a test on an attribute.
Each leaf node represents a class.
Extracting Classification Rules from Trees
Represent the knowledge in the form of IF-THEN rules.
One rule is created for each path from the root to a leaf.
Each attribute-value pair along a path forms a conjunction.
The leaf node holds the class prediction.
Rules are easier for humans to understand.
Example:-
IF age = "<=30" AND student="no" THEN buys_computer = "no"
IF age = "<=30" AND student = "yes" THEN buys_computer = "yes"
IF age = "31…40" THEN buys_computer = "yes"
IF age = ">40" AND creditrating = "excellent" THEN buyscomputer = "no"
IF age = ">40" AND creditrating = "fair" THEN buyscomputer = "yes"
Introduction to Decision Trees
A decision tree is a tree with the following properties:-
An inner node represents an attribute.
An edge represents a test on the attribute of the father node.
A leaf represents one of the classes.
Construction of a decision tree:-
Based on the training data.
Top-Down strategy.
Decision Tree Example
Training Data Set:-
The data set has five attributes.
There is a special attribute: the attribute class is the class label.
The attributes, temp (temperature) and humidity are numerical attributes
Other attributes are categorical, that is, they cannot be ordered.
Based on the training data set, a set of rules is created to know what values of outlook, temperature, humidity, and wind determine whether or not to play golf.
Decision Tree Example (continued)
In a decision tree, each leaf node represents a rule.
Rules corresponding to the tree given in Figure:-
RULE 1: If it is sunny and the humidity is not above 75%, then play.
RULE 2: If it is sunny and the humidity is above 75%, then do not play.
RULE 3: If it is overcast, then play.
RULE 4: If it is rainy and not windy, then play.
RULE 5: If it is rainy and windy, then don't play.
Neural Network Representation
An ANN is composed of processing elements called perceptrons, organized in different ways to form the network's structure.
Processing Elements:-
An ANN consists of perceptrons.
Each of the perceptrons receives inputs, processes inputs, and delivers a single output.
Inputs:-
The input can be raw input data or the output of other perceptrons.
Weights
Output
The output can be the final result (e.g., 1 means yes, 0 means no) or it can be inputs to other perceptrons.
Appropriate Problems for Neural Network
ANN learning is well-suited: ANN learning is well-suited to problems in which the training data corresponds to noisy, complex sensor data. It is also applicable to problems for which more symbolic representations are used.
The backpropagation (BP) algorithm is the most commonly used ANN learning technique. It is appropriate for problems with the characteristics:-
Input is high-dimensional discrete or real-valued (e.g., raw sensor input)
Output is discrete or real-valued
Output is a vector of values
Possibly noisy data
Long training times accepted
Fast evaluation of the learned function required.
Not important for humans to understand the weights
Examples:-
Speech phoneme recognition
Image classification
Financial prediction
Neural Network Application Development
The development process for an ANN application has eight steps.-
Step 1: (Data collection) The data to be used for the training and testing of ANN are collected. Important considerations are that the particular problem is amenable to ANN solution and that adequate data exist and can be obtained.
Step 2: (Training and testing data separation) Trainning data must be identified, and a plan must be made for testing the performance of ANN. The available data are divided into training and testing data sets. For a moderately sized data set, 80% of the data are randomly selected for training, 10% for testing, and 10% secondary testing.
Step 3: (Network architecture) A network architecture and a learning method are selected. Important considerations are the exact number of nodes and the number of layers.
Neural Network Application Development (continued)
Step 4: (Parameter tuning and weight initialization) There are parameters for tuning ANN to the desired learning performance level. Part of this step is initialization of the network weights and parameters, followed by modification of the parameters as training performance feedback is received.-
Often, the initial values are important in determining the effectiveness and length of training.
Step 5: (Data transformation) Transforms the application data into the type and format required by the ANN.
Step 6: (Training) Training is conducted iteratively by presenting input and known output data to the ANN. The ANN computes the outputs and adjusts the weights until the computed outputs are within an acceptable tolerance of the known outputs for the input cases.
Neural Network Application Development (continued)
Step 7: (Testing) Once the training has been completed, it is necessary to test the network.-
The testing examines the performance of ANN using the derived weights by measuring the ability of the network to classify the testing data correctly.
Black-box testing (comparing test results to historical results) is the primary approach for verifying that inputs produce the appropriate outputs.
Step 8: (Implementation) Now a stable set of weights are obtained.-
Now ANN can reproduce the desired output given inputs like those in the training set.
The ANN is ready to use as a stand-alone system or as part of another software system where new input data will be presented to it and its output will be a recommended decision.
Neural Network Characteristics
Neural Network learns by adjusting the weights so as to be able to correctly classify the training data and hence, after the testing phase, to classify unknown data.
Neural Network needs a long time for training.
Neural Network has a high tolerance to noisy and incomplete data.
Neural Network Classifier
Input: Classification data, which contains a classification attribute.
Data is divided into training data and testing data, as in any classification problem.
All data must be normalized (i.e., all values of attributes in the database are changed to contain values in the internal [0,1] or [-1,1] range).
Neural Network can work with data in the range of (0,1) or (-1,1).
Two basic normalization techniques:-
Max-Min normalization
Decimal Scaling normalization
Neural Network Model Example: Loan Prospector - HNC/Fair Isaac
Inputs:-
Living space
Size of garage
Appraised value
Age of house
etc.
Output
A Neural Network (Expert System) is like a black box that knows how to process inputs to create a useful output.
The calculation(s) are quite complex and difficult to understand.
Neural Network Concepts
Neural networks (NN): a brain metaphor for information processing
Neural computing
Artificial neural network (ANN)
Many uses for ANN for-
pattern recognition
forecasting
prediction
classification
Many application areas
finance
marketing
manufacturing
operations
information systems
and so on
Biological Neural Networks
Components:-
Dendrites
Soma
Axon
Synapse
Two interconnected brain cells (neurons)
Biological Neuron Model
Four parts of a typical nerve cell:-
Dendrites: Accepts the inputs
Soma: Processes the inputs
Axon: Turns the processed inputs into outputs.
Synapses: The electrochemical contact between the neurons.
Artificial Neural Network
Artificial Neural Networks (ANNs) are programs designed to solve any problem by trying to mimic the structure and the function of our nervous system.
Neural networks are based on simulated neurons, which are joined together in a variety of ways to form networks.
Neural networks resemble the human brain in the following two ways:-
A neural network acquires knowledge through learning.
A neural network's knowledge is stored within the interconnection strengths known as synaptic weight.
Neural Network Architectures
Fully connected network
In which every node is connected to every other node, and these connections may be either excitatory (positive weights), inhibitory (negative weights), or irrelevant (almost zero weights).
Layered network
Networks in which nodes are partitioned into subsets called layers, with no connections from layer j to k if j > k.
Neural Network Architectures (continued)
Acyclic network
Subclass of the layered networks in which there is no intra-layer connections. In other words, a connection may exist between any node in layer i and any node in layer j for i < j, but a connection is not allowed for i=j.
Feedforward network
Subclass of acyclic networks in which a connection is allowed from a node in layer i only to nodes in layer i+1.
Artificial Neuron Model
Inputs to the network are represented by the mathematical symbol, . Each of these inputs is multiplied by a connection weight, .
These products are simply summed, fed through the transfer function, f(), to generate a result and then output.
Processing Information in ANN
A single neuron (processing element - PE) with inputs and outputs
Includes weights, summation, and a transfer function
Advantages of Neural Networks
It involves human-like thinking.
They handle noisy or missing data.
They can work with a large number of variables or parameters.
They provide general solutions with good predictive accuracy.
The system has the property of continuous learning.
They deal with the non-linearity in the world in which we live.
Supervised Learning
Weight, Feature, Currency, Label
Definition of Supervised Learning
Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples.
In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
Types of Supervised Learning
Classification
Regression
Regression means to predict the output value using training data.
Classification means to group the output into a class.
e.g., we use regression to predict the house price from training data and use classification to predict the Gender.
Classification Problem
Graphical representation of data points and classification boundary.
Regression Problem
Presence of Target variable
House price = F(size)
Data pairs (House price1, Size1), (House price2, Size2) so on
Regression (Linear)
Examples of Classification Applications
"Which category of products is most interesting to this customer?"
"Is this movie a romantic comedy, documentary, or thriller?"
"Is this review written by a customer or a robot?"
"Will the customer buy this product?"
"Is this email spam or not spam?"
Unsupervised Learning
In Unsupervised Learning, the algorithm is trained using data that is unlabeled
Known Data -> Pattern Recognition -> Model -> Response
Types of Unsupervised Learning
Clustering
The method of dividing the objects into clusters which are similar between them and are dissimilar to the objects belonging to another cluster
Association
Discovering the probability of the co-occurrence of items in a collection
Unsupervised Learning Example
Grouping batsmen and bowlers
Examples of Clustering Applications
"How to group customers for targeted marketing purposes?"
"Which neighborhoods in a country are most similar to each other?"
"What groups of insurance policyholders have high claim costs?"
"How to group the products in a store based on their attributes?"
"How to group pictures based on their description?"
Examples of Recommendation Applications
"Which movies should be recommended to a user?"
"If the user just listened to a song, which song would he like now?"
"Which news articles are relevant for a user in a particular context?"
"Which advertisements should be displayed for a user on a mobile app?"
“Which products are frequently bought together?”
Supervised vs. Unsupervised Learning
Supervised Learning:-
A teacher is available to indicate whether a system is performing correctly or to indicate the amount of error in system performance. Here a teacher is a set of training data.
The training data consist of pairs of input and desired output values that are traditionally represented in data vectors.
Supervised learning can also be referred to as classification, where we have a wide range of classifiers, (Multilayer perceptron, k-nearest neighbor..etc)
Unsupervised Learning:-
This is learning by doing.
In this approach, no sample outputs are provided to the network against which it can measure its predictive performance for a given vector of inputs.
One common form of unsupervised learning is clustering where we try to categorize data in different clusters by their similarity.
Web Mining
Web