L9 - Gradients, and a bit more about time series

0.0(0)

Studied by 0 people

View linked note

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/15

Earn XP

Description and Tags

Flashcards covering topics from Lecture 9, including time series anomaly detection, transformer models, gradients for explainability, and adversarial attacks.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

16 Terms

New cards

What is Anomaly Detection?

A self-supervised machine learning task focused on identifying rare events or outliers that deviate significantly from the norm. Unlike fraud detection with labeled fraud cases, anomaly detection typically operates in scenarios with limited (or zero) examples of anomalies. Think of it like finding a single unripe apple in a crate of ripe ones; you know what ripe apples look like, but the anomaly stands out without prior 'unripe' labels.

New cards

What is the primary approach for Anomaly Detection using Autoencoders?

The primary approach involves training an autoencoder to reconstruct normal data. During inference, data points with high reconstruction error (i.e., the autoencoder's output is very different from the input) are flagged as anomalies. This is akin to trying to photocopy an image – if the copy is blurry or distorted, something is likely wrong with the original (an anomaly).

New cards

What are Autoencoders, and how do they work?

Autoencoders are neural networks trained to reconstruct their input, effectively learning a compressed representation of the input data. They consist of an encoder network that compresses the input into a latent space (bottleneck) and a decoder network that reconstructs the original input from this compressed representation. Imagine squeezing a balloon animal flat (encoding) and then re-inflating it (decoding); an autoencoder does this with data.

New cards

What are the key applications of Autoencoders?

Beyond anomaly detection, autoencoders are used for data compression (reducing data storage), denoising (removing noise from images or audio), and feature extraction. They're like a Swiss Army knife for data manipulation, each tool optimized for a specific task involving data transformation.

New cards

How can Autoencoders be used for Anomaly Detection, step-by-step?

Train the autoencoder on normal data.
Feed new data through the trained autoencoder.
Calculate the reconstruction error by comparing the original input to the reconstructed output.
Set a threshold for the reconstruction error; data points exceeding this threshold are flagged as anomalies.
This boils down to: large reconstruction error = anomaly.

The measure of surprise from this process is based on how well the autoencoder can 'reproduce' its input. A high error signals something unusual.

New cards

What are the limitations of Recurrent Neural Networks (RNNs) in handling sequential data?

Recurrent networks, while designed to process sequential data, struggle with long-range dependencies due to the vanishing gradient problem. They also process sequences sequentially, limiting parallelization and slowing down training. Imagine trying to recount a story you heard a week ago, detail by detail; you might forget the beginning, similar to how RNNs lose context over long sequences.

New cards

What is a Transformer Model, and how does it differ from Recurrent Networks?

A sequence transduction model that relies solely on attention mechanisms, dispensing with recurrence and convolutions. This architecture enables greater parallelization and faster training, making it ideal for tasks requiring long-range dependency understanding. Unlike RNNs, Transformer models can instantly access any part of the input sequence, similar to reading a book by jumping between paragraphs rather than reading word-by-word.

New cards

What is an Attention Mechanism, and how does it enhance sequence processing?

An attention mechanism allows the model to assign different weights to different parts of the input sequence, dynamically focusing on the most relevant elements for each prediction. This is analogous to focusing your attention on specific keywords when reading a document to understand its core message.

New cards

What are typical tasks performed by Encoders and Decoders in sequence processing?

Encoders convert variable-length input sequences into fixed-length vector representations (context vectors), while decoders generate output sequences element-by-element in an autoregressive manner, conditioned on the context vector. Think of encoding as summarizing a book into an executive summary, and decoding as expanding that summary back into a new book.

New cards

How can we quantify the impact of changes in model parameters on the model's performance?

The effect can be quantified by computing the gradient of the loss function with respect to the model parameters. This reveals the sensitivity of the loss to changes in each parameter, indicating which parameters have the most significant impact on reducing the error. It's like turning knobs on a radio – the gradient tells you which knob to turn to get the clearest signal.

New cards

How can Gradients be used for model Explainability?

By computing the gradient of the model's output with respect to its input data, we can understand which input features most influence the model's decisions on a per-data-point basis. This gives a sensitivity map of the input space, highlighting areas that trigger certain responses in the model. Think of it as tracing the path of water flowing down a hill to see which slopes direct the flow.

New cards

What problems arise when relying solely on Gradients for model Explainability?

Gradients can be noisy or zero out, particularly when using ReLU (Rectified Linear Unit) activation functions that result in zero output activations, leading to incomplete or misleading explanations. Using just gradients is like trying to understand a city's traffic by only looking at the speed of a few cars; you miss the bigger picture.

New cards

What are some solutions for addressing Noisy Gradients in explainable AI?

Techniques such as SmoothGrad (averaging gradients after adding noise to the input) and Integrated Gradients (integrating gradients along a path from a baseline input) help to reduce the noise and provide more robust and reliable explanations. It's like taking multiple photos of the same scene and averaging them to reduce the impact of individual noise pixels.

New cards

How are Adversarial Examples constructed to fool Neural Networks?

Adversarial examples are created by making small, intentional perturbations to the input data in the opposite direction of the gradient, causing the model to misclassify the input. These perturbations are often imperceptible to humans but can drastically alter a neural network's prediction. It's like whispering the wrong answer to someone during an exam; the change is small, but it leads to a wrong response.

New cards

What is the difference between Targeted and Black Box Attacks in the context of adversarial examples?

Targeted attacks require access to the model's gradients to craft specific adversarial inputs designed to cause a particular misclassification. Black box attacks, conversely, only require analyzing the model's outputs without needing internal access or gradient information. The former is like a sniper aiming at a specific target; the latter is like blindly throwing darts at a board.

New cards

What Defense Strategies can be employed Against Adversarial Attacks to improve model robustness?

Strategies include: 1. Training the model on adversarial examples (adversarial training), 2. Using ensemble methods to combine multiple models, 3.