Comprehensive Notes on Machine Learning Applications, Malware Detection Techniques, and Large Language Model Analysis
Analyzing Hardware Based Malware Detectors
- Motivators for choosing hardware-based malware detectors over software-based ones:
- Software-based detection executes too slowly to classify malware in time.
- Resilience of hardware-based malware detectors:
- Operate at a lower system level, below the operating system.
- Less susceptible to tampering by malware that targets software layers.
A Low Cost EDA-based Stress Detection Using Machine Learning
- Main motivator for preferring EDA over questionnaires:
- Questionnaires are subjective and time-consuming.
- EDA determines stress using skin conductivity with high accuracy, saving time and resources.
- Key advantage of using the RandomForest algorithm:
- Robust to noisy data due to the use of multiple decision trees.
Fake News Detection using Machine Learning Algorithms
- Main Reason to use machine learning in fake news detection:
- Machine learning can automatically recognize patterns in large amounts of text.
- Requirement for Machine Learning Models before they can classify fake news:
- A large dataset of labeled real and fake articles
Efficient Utilization of Adversarial Training towards Robust Machine Learners and its Analysis
- Most generalizable retrained classifier to other adversarial attacks (options given):
- Not specified in the content. Mentioned attacks are Jacobian Saliency Map Attack, Fast Gradient Sign Method, Carlini and Wagner, and Deep Fool Attack.
- Drawbacks of increasing the "complexity" of the adversarial attack method:
- Increasing complexity tends to achieve its goal better (i.e. MIM~=BIM>FGSM).
- Harder to implement (either takes more time or is more complex to implement).
- Certain algorithms are robust against types of adversarial training (i.e. CW).
Pyramid: Machine Learning Framework to Estimate the Optimal Timing and Resource Usage of a High-Level Synthesis Design
- What is not solved through Pyramid when measuring resource usage for FPGAs?
- Options include: Longer processing times for larger designs, Inaccuracies across differing FPGA devices, Inefficient resource usage, Optimization when determining maximum throughput.
- Main purpose of the Pyramid in FPGA design:
- To accurately predict hardware performance and resource usage using machine learning.
Predictive Models for Hospital Readmission Risk: A Systematic Review of Methods
- Significance of Area Under ROC Curve:
- AUC = 0 is chance performance.
- AUC = 0.5 is uninformed classifier.
- AUC = 1 is perfect performance.
- Improving results of a linear model with significant class imbalance:
- Use a sample that has a similar distribution to the population.
DNN Model Architecture Fingerprinting Attack on CPU-GPU Edge Devices
- How to use a fingerprinting attack to identify an AI model without hacking it:
- By tracking RAM, CPU, GPU usage.
- Difference between transferability and portability:
- Transferability measures generalization to new model variants.
- Portability measures how well the attack works across multiple platforms.
Survey of Machine Learning for Electronic Design Automation
- Why are the majority of ML models for electronic design automation RL or NN models?
- They are effective at handling high-dimensional combinatorial data in the design spaces, and can learn complex optimization using feedback systems.
- Why could the non-linearity of ML models be an issue for designers and engineers using EDA tools?
- The lack of interpretability can make debugging very difficult.
Pain Level Modeling of Intensive Care Unit Patients with Machine Learning Methods: An Effective Congeneric Clustering-based Approach
- Major limitations on using machine learning for measuring pain:
- There isn't a large specialized dataset to train off of.
- Pain is still specific for individuals and groups, so a general model can only go so far.
- Why would using machine learning to estimate pain be more desirable than just asking the patient?
- Scale of 1-10 is vague
- Patients may want to over/under report their pain
- Some patients may have limited communication methods
R2AD: Randomization and Reconstructor-based Adversarial Defense for Deep Neural Networks
- Two main components of the R2AD defense and how they work:
- R2AD combines (1) Random Nullification (RNF), which randomly masks parts of the input to disrupt adversarial patterns, and (2) a Reconstructor (autoencoder), which attempts to recover the original input and computes reconstruction error to detect suspicious inputs.
- Together, they reject adversarial examples and preserve clean inputs.
- Trade-off of using the Random Nullification (RNF) layer:
- RNF weakens adversarial perturbations by nullifying input features, but it may also remove important features from clean inputs, reducing model accuracy.
- The defense is effective but requires careful parameter tuning to balance security and performance.
Adversarial Attack on Microarchitectural Events based Malware Detectors
- Why are HMD systems fast and lightweight?
- They use built-in CPU counters to monitor hardware events without analyzing code, so there's no extra overhead. Great for embedded devices.
- How can an attacker use a separate thread to generate adversarial hardware performance events without modifying the original program?
- By running a separate wrapper program that creates fake hardware activity like cache and branch misses.
Ensemble Learning for Effective Run-Time Hardware-Based Malware Detection: A Comprehensive Analysis and Classification
- Primary goal of ensemble learning:
- To combine multiple models to improve overall performance
- Difference between bagging and boosting:
- Bagging reduces variance, while boosting reduces bias.
Machine Learning Techniques in Wireless Sensor Network Based Precision Agriculture
- Common ML techniques used for crop yield prediction:
- Ensemble Learning
- Bayesian Models
- Support Vector Machines
- Artificial Neural Networks
- How is artificial intelligence (AI) most commonly used in modern agriculture?
- Analyzing data to optimize irrigation, detect pests, and predict crop yields
TinyML Meets IoT: A Comprehensive Survey
- Can TinyML do both training and inference?
- No, typically only inference is performed on the device. Training is done offline on a server or cloud, and the compressed model is deployed to the device.
- Primary objective of quantization in the optimization of TinyML models:
- To make the model smaller and faster by using fewer bits for numbers.
An Overview of Privacy in Machine Learning
- Primary difference between centralized and federated learning in terms of data storage:
- Centralized learning gathers all data in one place; federated learning keeps data on local devices.
- What is a membership inference attack?
- Given a model, the adversary infers whether a data point was used in the training set.
A Machine Learning Model for Emotion Recognition from Physiological Signals
- Which machine learning model achieved the highest accuracy in recognizing amusement, sadness, and neutral emotions using the selected physiological features?
- Correct Answer: B. Support Vector Machine with Linear Kernel (SVML)
- What is the main advantage of using physiological signals, such as GSR and PPG, for emotion recognition compared to facial expressions or speech?
- Correct Answer: C. They are involuntary and harder to fake
Large Language Models for Code Analysis: Do {LLMs} Really Do Their Job?
- What does the prompt construction strategy in this study reveal about how LLMs are best used in software engineering tasks?
- Prompt phrasing and structure play a critical role in how LLMs interpret and perform analysis tasks
- What is a key reason the authors evaluated LLMs on obfuscated code?
- To examine whether LLMs are robust to real-world transformations that break code readability
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
- Which of the following is the primary advantage of using synthetic data generated by a large language model (LLM) like ChatGPT to train a local model for clinical NLP tasks?
- Answer: C. It reduces privacy risks by avoiding the upload of real patient information while providing labeled examples.
- In a clinical relation extraction (RE) task, a model is fine-tuned on synthetic examples produced by an LLM. Which is the most realistic outcome?
- Answer: C. The model performs well on a held-out test set of manually labeled examples
- What are some limitations of traditional LLM benchmarks that Chatbot Arena addresses?
- Traditional benchmarks often rely on static datasets and predefined answers, which may not capture the nuances of open-ended tasks or evolving real-world applications.
- What is the significance of using pairwise comparison in Chatbot Arena?
- Pairwise comparisons simplify the evaluation process by focusing on relative preferences between two models, making it easier to gather and interpret human judgments.