COMPSCI Quiz Part 1
What is Predictive Analytics?
Involves building and using models to make predictions.
Prediction can entail forecasting future events or predicting unknown scenarios (e.g., diagnosing diseases).
Business Applications of Predictive Analytics
Stock Market Prediction: Forecasting stock prices based on economic factors and historical data.
Price Prediction: Estimating optimal prices for airlines and hotels to maximize profits, taking into account seasonal variations.
Loan Prediction: Assessing the probability of loan defaults.
Fraud Detection: Identifying potential fraudulent credit card transactions.
Medical Applications of Predictive Analytics
Medical Diagnosis: Determining if a certain disease is present in a patient.
Disease Susceptibility Prediction: Evaluating if a patient is likely to develop a particular disease.
Prognosis Prediction: Assessing patient recovery prospects.
Dosage Prediction: Estimating optimal medication dosages using historical outcome data.
Other Applications of Predictive Analytics
Speech Recognition: Devices that understand and recognize spoken words.
Handwritten Letter Recognition: Automating the processing of mail using zip code recognition.
Document Classification: Categorizing emails as spam or not; analyzing sentiments in reviews.
Predictive Analytics Process
Methods of Predictive Analytics:
Using a crystal ball (figuratively) for prediction.
Manually building predictive models.
Employing machine learning techniques for automatic model creation.
Example Task: Digit Recognition
Task involves recognizing and classifying digits (0-9) from images.
Input image data is transformed into corresponding features for classification.
Manual vs. Machine Learning in Digit Recognition
Manually defining rules to recognize digits (like the number seven) can be complex and may not capture all variations.
Machine learning simplifies this task by automatically building models based on extensive training data.
What is Machine Learning?
Learning: Transforming experience into expertise.
Machine Learning: The process of machines converting experience into expertise by learning models from past data for predicting future outcomes.
ML Terminology
Target: The variable being predicted (target feature).
Features: Variables used to describe the data (descriptive features).
Examples/Instances: Combinations of features and their corresponding targets used for training.
ML: Training and Testing
The framework consists of training data featuring pairs of attributes and targets.
A variety of methods can be used to create predictive models (e.g., logistic regression, decision trees).
Features and Target for Various Applications
Stock Market Prediction:
Target: Stock price
Features: Economic indicators
Credit Card Fraud Detection:
Target: Fraud/non-fraud
Features: Transaction details
Email Classification:
Target: Spam/non-spam
Features: Email content
Medical Diagnosis:
Target: Disease presence
Features: Patient characteristics and symptoms.
An Example of Machine Learning Task in Medicine
Diabetes Prediction: Data reflecting patient characteristics used to predict whether they will develop diabetes.
Features Include: Number of pregnancies, glucose levels, BMI, age, etc.
Predictive model outputs positive or negative diagnosis based on inputs.
Steps in Machine Learning Training
Training: System learns a model from labeled examples.
Testing: The trained model makes predictions on new, unseen examples.
ML and Generalization
The goal of ML is to apply learned insights to novel instances rather than memorizing training data.
Successful ML models generalize well to new data.
Fundamental Assumption of ML
Effective ML assumes training and test data come from the same distribution.
Differences in data distribution can hinder model performance.
Examples of Violating the Fundamental Assumption
Training a disease diagnosis model on a specific age group and testing on a different one.
Using outdated data for stock predictions.
Sample Bias in Predictive Analytics
Sample bias arises when training data does not adequately represent the population used during predictions.
It can result from limited training sets or exclusionary data collection practices.
When Machine Learning is Needed
Applicable for complex tasks unsuitable for manual programming (e.g., image classification).
Ideal for handling vast datasets beyond human capacity.
Benefits of Machine Learning
Capable of analyzing large numbers of features and training on extensive datasets.
Can uncover hidden relationships in data that may not be observable by humans.
When to Avoid Machine Learning
Not necessary when tasks have established analytic solutions or can be physically modeled (e.g., geometric calculations).
Ineffective in scenarios lacking predictive features (e.g., lottery predictions).