COMPSCI Quiz Part 1

What is Predictive Analytics?

  • Involves building and using models to make predictions.

  • Prediction can entail forecasting future events or predicting unknown scenarios (e.g., diagnosing diseases).

Business Applications of Predictive Analytics

  • Stock Market Prediction: Forecasting stock prices based on economic factors and historical data.

  • Price Prediction: Estimating optimal prices for airlines and hotels to maximize profits, taking into account seasonal variations.

  • Loan Prediction: Assessing the probability of loan defaults.

  • Fraud Detection: Identifying potential fraudulent credit card transactions.

Medical Applications of Predictive Analytics

  • Medical Diagnosis: Determining if a certain disease is present in a patient.

  • Disease Susceptibility Prediction: Evaluating if a patient is likely to develop a particular disease.

  • Prognosis Prediction: Assessing patient recovery prospects.

  • Dosage Prediction: Estimating optimal medication dosages using historical outcome data.

Other Applications of Predictive Analytics

  • Speech Recognition: Devices that understand and recognize spoken words.

  • Handwritten Letter Recognition: Automating the processing of mail using zip code recognition.

  • Document Classification: Categorizing emails as spam or not; analyzing sentiments in reviews.

Predictive Analytics Process

  • Methods of Predictive Analytics:

    • Using a crystal ball (figuratively) for prediction.

    • Manually building predictive models.

    • Employing machine learning techniques for automatic model creation.

Example Task: Digit Recognition

  • Task involves recognizing and classifying digits (0-9) from images.

  • Input image data is transformed into corresponding features for classification.

Manual vs. Machine Learning in Digit Recognition

  • Manually defining rules to recognize digits (like the number seven) can be complex and may not capture all variations.

  • Machine learning simplifies this task by automatically building models based on extensive training data.

What is Machine Learning?

  • Learning: Transforming experience into expertise.

  • Machine Learning: The process of machines converting experience into expertise by learning models from past data for predicting future outcomes.

ML Terminology

  • Target: The variable being predicted (target feature).

  • Features: Variables used to describe the data (descriptive features).

  • Examples/Instances: Combinations of features and their corresponding targets used for training.

ML: Training and Testing

  • The framework consists of training data featuring pairs of attributes and targets.

  • A variety of methods can be used to create predictive models (e.g., logistic regression, decision trees).

Features and Target for Various Applications

  • Stock Market Prediction:

    • Target: Stock price

    • Features: Economic indicators

  • Credit Card Fraud Detection:

    • Target: Fraud/non-fraud

    • Features: Transaction details

  • Email Classification:

    • Target: Spam/non-spam

    • Features: Email content

  • Medical Diagnosis:

    • Target: Disease presence

    • Features: Patient characteristics and symptoms.

An Example of Machine Learning Task in Medicine

  • Diabetes Prediction: Data reflecting patient characteristics used to predict whether they will develop diabetes.

  • Features Include: Number of pregnancies, glucose levels, BMI, age, etc.

  • Predictive model outputs positive or negative diagnosis based on inputs.

Steps in Machine Learning Training

  • Training: System learns a model from labeled examples.

  • Testing: The trained model makes predictions on new, unseen examples.

ML and Generalization

  • The goal of ML is to apply learned insights to novel instances rather than memorizing training data.

  • Successful ML models generalize well to new data.

Fundamental Assumption of ML

  • Effective ML assumes training and test data come from the same distribution.

  • Differences in data distribution can hinder model performance.

Examples of Violating the Fundamental Assumption

  • Training a disease diagnosis model on a specific age group and testing on a different one.

  • Using outdated data for stock predictions.

Sample Bias in Predictive Analytics

  • Sample bias arises when training data does not adequately represent the population used during predictions.

  • It can result from limited training sets or exclusionary data collection practices.

When Machine Learning is Needed

  • Applicable for complex tasks unsuitable for manual programming (e.g., image classification).

  • Ideal for handling vast datasets beyond human capacity.

Benefits of Machine Learning

  • Capable of analyzing large numbers of features and training on extensive datasets.

  • Can uncover hidden relationships in data that may not be observable by humans.

When to Avoid Machine Learning

  • Not necessary when tasks have established analytic solutions or can be physically modeled (e.g., geometric calculations).

  • Ineffective in scenarios lacking predictive features (e.g., lottery predictions).