1/16
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Business Intelligence (BI)
collection of processes, technologies, skills, and applications used to collect, integrate, analyze, and present business information.
involves the process of transforming data into actionable insights that inform an organization’s strategic and tactical business decisions.
Why Business Intelligence?
Helps generate insights from data to support business decisions and improve the profitability and efficiency
Strategic Decision Making: By bridging the gap between raw data and actionable knowledge, BI enables companies to make informed strategic decisions that can lead to competitive advantages.
Enhanced Efficiency: Properly implemented BI can significantly enhance operational efficiency by automating and optimizing processes based on insights drawn from data.
Improved Responsiveness: With actionable insights, businesses can respond more quickly to market changes and customer needs, enhancing agility.
Business Intelligence Process
Data Understanding
Data Preparation
Data Understanding
With preliminary analysis, data exploration provides a high-level overview of each variable in the dataset and interaction between the variables.
If a variable represents a characteristic measured in numbers, it is called a numeric variable.
If a variable consists of a set of categories, the variable is called a categorical variable.
Data Preparation
Before applying any data mining algorithms, it's crucial to prepare the dataset to ensure the integrity and accuracy of the analysis.
This involves addressing various anomalies that may affect the results:
Outlier Detection
Handling Missing Values
Removing Duplicates
Addressing Multicollinearity
Data Cleaning and Transformation
Data Preparation: Outlier Detection
Identify unusual entries, such as patients listed with ages over 120 years, which could skew analysis results.
Data Preparation: Handling Missing Values
Address gaps in data, which may lead to data sparsity, impacting the reliability of the mining process.
Data Preparation: Removing Duplicates
Eliminate duplicate records to prevent biased statistical results.
Data Preparation: Addressing Multicollinearity
Resolve highly correlated variables, such as age and date of birth, to improve the validity of regression models.
Data Preparation: Data Cleaning and Transformation
Cleanse the data by fixing or removing errors and inconsistencies, and transform data into a suitable format for analysis.
Data Mining Tasks
Supervised Learning
Unsupervised Learning
Supervised Learning
Method of machine learning, training a model on a dataset including both the input variables (x) and the output variable (Y)
Objective is to develop a mapping function that accurately predicts the output based on the input variables.
does not imply human guidance; refers to known output values in the training data, and allows the learning algorithm to evaluate its accuracy and adjust.
A predictive model, as used in supervised learning, is designed for tasks that require predicting a specific value from other data points within the dataset.
Classification & Numeric Prediction
Supervised Learning: Classification
involves predicting categorical labels. The output, or target variable, is a category rather than a continuous value.
Examples of classification problems include:
Determining whether an email message is spam or not spam.
Diagnosing whether a person has cancer based on medical test results.
Predicting whether a football team will win or lose a match.
Assessing if an applicant will default on a loan based on their financial history.
In each of these cases, the model is trained to assign discrete categories to the input data, making it a fundamental tool for decision-making across various fields.
Supervised Learning: Numeric Prediction
involves forecasting a continuous quantity. This type of prediction is crucial for tasks such as estimating future sales figures or predicting stock prices.
if a company has experienced steady monthly sales growth over the past few years, a linear analysis can be conducted using the historical monthly sales data.
Analysis helps the company to forecast sales for upcoming months, providing valuable insights for strategic planning and resource allocation.
Unsupervised Learning
Machine learning approach that utilizes input data (X) without any associated labels.
Used to analyze and cluster unlabeled datasets to identify hidden patterns or natural groupings within the data, all without the guidance of a specific target outcome or human supervision.
algorithms autonomously learn the underlying structure of the data by identifying features and patterns independently.
process is crucial for discovering insights that are not immediately obvious, providing a foundational technique for exploratory data analysis and complex problem solving.
Clustering & Association Rule Mining
Clustering
involves grouping a set of objects such that objects within the same group are more similar to each other than to those in different groups. This technique is particularly useful in applications like customer segmentation.
Retailers may use clustering to segment customers based on their spending patterns and sensitivity to price changes.
Key variables for such segmentation might include total expenditure, the value of discounts received, and the number of items purchased at a discount.
By understanding these dynamics, businesses can tailor marketing strategies and product offerings to better meet the needs and preferences of distinct customer groups.
Association Rule Mining
technique used in unsupervised learning to discover interesting relationships hidden in large datasets. It identifies rules that explain how or why certain items often occur together.
Uncover patterns such as customers who purchase item X also frequently buy item Y.
Insight can be used for effective cross-selling and upselling strategies, optimizing store layouts, and enhancing promotional campaigns targeted at increasing the sale of related products.