Software Security - Week 14_1

Detecting Zero-Day Web Attacks with an Ensemble of LSTM, GRU, and Stacked Autoencoders

Introduction

The paper introduces an intelligent system for detecting zero-day web attacks.
The system employs a novel one-class ensemble method comprised of LSTM autoencoder, GRU autoencoder, and stacked autoencoder architectures.
A novel tokenization strategy converts normal web requests into structured numeric sequences.
The ensemble model identifies anomalous activities by uniquely concatenating and compressing latent representations from each autoencoder.
The method efficiently detects unknown web attacks and addresses limitations of previous methods like high memory consumption and excessive false positive rates.
Experimental evaluations demonstrate high detection metrics: 97.58% accuracy, 97.52% recall, 99.76% specificity, and 99.99% precision, with a 0.2% false positive rate.
Websites and web-based applications are crucial in modern digital infrastructure but are subject to security threats.
Manipulating web requests is a primary attack vector where adversaries exploit vulnerabilities.
Detection and mitigation of malicious web requests are vital for online service security.
Security mechanisms like Web Application Firewalls (WAFs) and blacklisting techniques are used to counter threats but are ineffective against zero-day attacks.
Zero-day attacks introduce previously unseen patterns that traditional rule-based detection systems fail to recognize.
Deep learning-based anomaly detection presents a promising approach, leveraging neural networks to autonomously identify deviations indicative of malicious activity.
Conventional methods such as WAFs and blacklisting have limitations including being time-consuming and insufficient against evolving attack patterns and being incapable of detecting zero-day attacks.
Anomaly detection models do not require prior exposure to zero-day attacks to effectively detect them.
The model is trained exclusively on normal web request data and identifies deviations, flagging both known and previously unseen attacks as anomalous.
Web attacks such as SQL Injection (SQLi), Cross-Site Scripting (XSS), and Buffer Overflow are treated as zero-day attacks within the dataset for evaluation.
The model classifies any request with an anomaly score exceeding a predefined threshold as a potential zero-day attack and reliably detects anomalous activities.
The primary objective is to simultaneously address both known and zero-day attacks while maintaining a high detection rate and minimizing false positives.
The paper is structured into sections covering foundational concepts and research background, literature review, methodology and architectural design, performance evaluation, broader implications, and concluding remarks.
The study introduces a novel ensemble approach by integrating LSTM, GRU, and stacked autoencoders for anomaly detection in web requests. The approach uniquely concatenates and compresses the latent representations from each autoencoder, improving anomaly detection performance and computational efficiency.
The paper proposes a tokenization strategy that classifies tokens based on character composition (numeric, lowercase, uppercase, and special characters). This reduces input dimensionality, ensures data representation consistency, and enhances detection capability.
The model is trained exclusively on normal web requests, enabling it to effectively identify and detect previously unseen zero-day attacks by capturing deviations from normal request patterns.
The study explicitly evaluates and reports the False Positive Rate, achieving a significantly lower FPR of 0.2%, underscoring the practical applicability of the model.

Background

Web attacks pose significant threats to user privacy and can severely disrupt web server operations.
Zero-day attacks are particularly concerning, as they can lead to privacy breaches and denial of service.
Heuristic methods rapidly identify potentially malicious activities based on predefined rules and patterns without extensive labeled datasets.
Heuristic algorithms analyze data patterns and system behaviors, identifying threats by matching observed behaviors against known suspicious patterns or predefined rules.
The primary advantage of heuristic approaches is the capability to detect novel or zero-day threats promptly, often faster than traditional signature-based methods.
Heuristic approaches can suffer from high false positives or negatives due to their reliance on manually defined rules and parameters, and thus require continuous refinement and adaptation.
Machine learning-based approaches for web attack detection typically consist of training and detection phases.
During the training phase, the model learns from patterns of normal web requests.
In the detection phase, the model utilizes learned knowledge to identify and mitigate potential web attacks.
Web attack countermeasures can be categorized into supervised, unsupervised, and semi-supervised learning.
The supervised approach is designed to detect known attacks and is commonly implemented in signature-based systems such as Web Application Firewalls (WAFs).
Supervised models rely on labeled datasets containing both historical attack patterns and normal requests and are ineffective against zero-day attacks.
The unsupervised approach employs anomaly detection techniques which do not rely on historical attack patterns, allowing it to identify previously unseen zero-day attacks.
Anomaly detection-based methods model the expected behavior of normal web traffic, effectively flagging deviations indicative of malicious activity.
The semi-supervised approach leverages normal web request data to train the model, focusing exclusively on learning the characteristics of legitimate requests and eliminating the need for labeled attack samples.
The research employs an unsupervised approach to address the challenge of detecting zero-day attacks.
The model learns the distribution of normal web requests and identifies requests that deviate from the established normal behavior, flagging them as potential zero-day attacks.

Related Work

Numerous methods and models have been proposed to counter web attacks, including zero-day attacks by researchers like Pu et al. [7], Ingham et al. [9], Sivri et al. [10], Jung et al. [11], Vartouni et al. [12], Ariu et al. [13], Liang et al. [14], Kuang et al. [15], Tang et al. [16], Indrasiri et al. [17], Gong et al. [18], Tekerek et al. [19], Jemal et al. [20], Alaoui et al. [21], Mohamed et al. [22], Yuan et al [23], Vorobyov et al. [24], Su et al. [25], Silvestre et al. [26], Yatagha et al. [27], Katbi et al. [28], Tokmak et al. [29], Alqhwazi et al. [30], Thalji et al. [31], Yao et al. [32].
Pu et al. introduce an unsupervised anomaly detection method that combines Sub-Space Clustering (SSC) and One Class Support Vector Machine (OCSVM) for detecting cyber intrusions without prior knowledge of the attacks.
Ingham et al. propose a method for detecting web attacks by focusing on deep learning techniques, specifically utilizing Transformer models.
Sivri et al. used XGBoost, LightGBM, LSTM and CNN. The LSTM model achieved the best accuracy and F1 score, while LightGBM performed better in computation time. The upsampling techniques have been used to balance the dataset, which helps improve classification metrics.
Jung et al. used Payload Feature-Based Transfer Learning (PF-TL) to cope with insufficient training data in intrusion detection systems, leveraging knowledge transfer from a labeled source domain to an unlabeled target domain by extracting features from both the header and payload of network traffic. The technique they use is a hybrid feature extraction, combining signature-based and text vectorization methods, to enhance the representation of attack patterns.
Vartouni et al. use a deep neural network-based method for feature learning and isolation forest for classification to identify malicious requests and employs an n-gram model.
Ariu et al. use an intrusion detection system that represents payloads as byte sequences, analyzed using Hidden Markov Models (HMM).
Liang et al. approach involves first training two Recurrent Neural Networks (RNNs) with Complex Recurrent Units (LSTM or GRU units) to learn normal request patterns solely from unsupervised normal requests and Then, a supervised neural network classifier is trained, taking the output of the RNN as input to categorize normal and abnormal requests.
Kuang et al. employ deep learning concepts to design a model named DeepWaf, a combination of LSTM and CNN deep neural networks.
Tang et al. model tokenizes words each word in an HTTP request (except for low-value words like ’and’, ’or’, etc.). Words and tokens are mapped to each other through TokenIDs. The tokenized request is then encoded and decoded using a Short-Term Memory architecture; if the decoded value matches the pre-encoded tokenized value, the request is benign, otherwise, it’s malicious. The model primarily targets zero-day attacks, leaving known attack detection to WAF and addressing zero-day attacks through the Zero-Wall model.
Indrasiri et al. model uses used seven classification algorithms, one clustering algorithm, two ensemble methods, and two large standard datasets. An ensemble model named ERG-SVC was proposed, using features selected by various feature selection methods.
Gong et al. method aims to address the problem of annotation errors in training data by incorporating model uncertainty into deep learning (DL) models specifically focusing on Convolutional Neural Networks (CNNs).
Tekerek et al. Focused on anomaly-based detection, this method preprocesses HTTP request data, particularly URLs and payloads, to identify unusual patterns indicative of potential threats using a deep learning architecture centered on Convolutional Neural Networks (CNNs).
Jemal et al. presents a smart web application firewall (SWAF) based on a convolutional neural network using a 5-fold cross-validation method.
Alaoui et al. proposes an approach based on Word2vec embedding and a stacked generalization ensemble model for LSTMs to detect malicious HTTP web requests.
Mohamed et al. Automatic extraction and classification of features from HTTP traffic are their main approaches which overcome limitations related to traditional feature engineering using algorithms like LSTM, Bi-LSTM, CNN, and RNN to propose a deep learning-based multi-class intrusion detection system that classifies different types of web attacks
Shahid et al. proposes a framework based on an enhanced hybrid approach where Deep Learning model is nested with a Cookie Analysis Engine for web attacks detection, mitigation and attacker profiling in real time.
Moarref et al. tries to focus on enhancing web attack detection via a character-level multichannel multilayer dilated convolutional neural network, processes HTTP request texts at the character level and extract relevant features. The model combines multichannel dilated convolutional blocks with varying kernel sizes to capture diverse temporal relationships and dependencies among characters.
Yatagha et al. proposed a hybrid anomaly detection model combining VAE, LSTM, and OCSVM to detect zero-day anomalies in cyber-physical systems. The model learns normal patterns and flags deviations using reconstruction errors and latent space analysis.
Katbi et al. proposed IDSVDD, a novel one-class anomaly detection framework for IoT environments that combines Deep SVDD with an interpolated adversarial autoencoder. The model enhances the structure of the latent space by enforcing convexity and regularization through adversarial interpolation, making it easier to distinguish anomalies from normal data.
Tokmak et al. presented a deep learning framework for zero-day threat detection that combines Stacked Autoencoders (SAE) for feature selection with an LSTM classifier.
Alqhwazi et al. proposed an SQL injection detection system using a Recurrent Neural Network (RNN) Autoencoder, trained on a public Kaggle dataset of SQL queries.
Thalji et al. proposed AE-Net, a novel autoencoder-based feature engineering approach for detecting SQL injection attacks. AE-Net extracts high-level deep features from SQL textual queries, which are then used as input to multiple machine learning and deep learning models.
Yao et al. proposed a lightweight intrusion detection system for IoT that combines a One-Class Bidirectional GRU Autoencoder with Soft-Voting Ensemble Learning.
Yuan et al. static SQL injection detection technique based on program transformation to address the limitations of existing tools in handling object-oriented database extensions (OODBE) in PHP applications.
Vorobyov et al. introduced a novel runtime protection mechanism against SQL injection attacks based on synthesizing fine-grained allowlists from benign SQL queries.
Su et al. proposed Splendor, a static analysis framework for detecting stored Cross-Site Scripting (XSS) vulnerabilities in modern PHP web applications, especially those using Data Access Layers (DAL).
Silvestre et al. introduced FreeSQLi, a novel static analysis tool that detects SQL injection vulnerabilities in PHP applications using session types.

Proposed Model

The section presents the proposed model for detecting zero-day web attacks, with a process that begins with preprocessing for standardized input representation then is fed into an ensemble of one-class classifiers to distinguish between normal and malicious traffic
The model comprises multiple components, each serving a distinct role in the detection process.
During the training phase, the model is exclusively trained on normal web requests to establish a baseline pattern of legitimate traffic.
In the testing phase, both normal and malicious web requests are input into the model for evaluation.

Tokenization

The model introduces a novel tokenization technique for web requests, applied at the word level to both normal and malicious.
The variability in the length and structure of web requests, addresses the challenges of training neural network-based models.
Anomaly detection principles effectively distinguish between legitimate and anomalous web traffic.
To ensure consistent data representation, the pre-processing pipeline standardizes normal web requests through a dictionary-based tokenization approach.
The dictionary includes categories such as Alpha, AlphaNum, CapitalAlpha, and SpecialChar, among others.
Data volume reduction: The tokenization process optimizes data representation, reducing the complexity and size of input data, thereby improving computational efficiency.
Pattern identification for anomaly detection: By establishing a structured pattern for normal web requests, enhancing the model’s ability to differentiate between legitimate and anomalous activities, ensuring higher accuracy in detecting malicious web requests.

Numerical Sequence

Each token must be mapped to a corresponding numerical value, as neural networks operate on numbers rather than raw text.
The varying range of input features necessitates data scaling before being processed by the model.
The tokenized text is converted into a structured numerical format suitable for input into the neural network.
Padding standardizes shorter sequences to match the fixed input length by appending neutral values, thereby standardizing input dimensions enhancing the model’s ability to analyze and recognize patterns within the data, ultimately improving detection accuracy and performance.

Ensemble Model

The proposed ensemble model consists of three sub-models: an LSTM autoencoder, a GRU autoencoder, and a stacked autoencoder.
The initial phase involves selecting appropriate sub-models for training and detection to enhance detection accuracy and computational efficiency.
The outputs of these sub-models are concatenated and further processed through a dense layer for feature reduction, optimizing the final classification process.
Autoencoders consist of an encoder and a decoder, both comprising multiple layers, which collectively transform an input sequence of symbols (words) into a continuous latent representation preserving critical features while filtering out noise.
This reconstruction-based learning approach enables autoencoders to effectively capture underlying patterns in web requests, further improving the robustness of the detection framework.
The primary objective of the decoding process is to validate the quality and representativeness of the extracted features.

Evaluation and Results

This section presents the evaluation of the proposed model and its sub-models based on multiple performance metrics, including accuracy, detection rate, sensitivity, precision, and false positive rate.
The effectiveness is assessed via a threshold-based evaluation using the Mean Absolute Error (MAE), which quantifies the difference between the reconstructed request and the original input.
MAE is a widely used metric for measuring the absolute difference between predicted values and their actual counterparts.
The classification criterion is based on the computed MAE for a given web request: if the MAE falls below a predefined threshold, the request is classified as normal; otherwise, it is identified as malicious.

Data Collection

Previous research has primarily relied on two well-established datasets: CSIC [42] and HTTPPARAMS [42].
The proposed model leverages the CSIC dataset for training and evaluation, which encompasses a diverse range of malicious requests, including SQL Injection (SQLi), Cross-Site Scripting (XSS), and Buffer Overflow attacks and contains normal (benign) web requests.
To maintain data integrity, ambiguous anomalies are removed from the dataset prior to training.

Ensemble Model Structure

the LSTM autoencoder and GRU autoencoder each consist of four layers using the default tanh activation function, while the stacked autoencoder comprises four dense layers with linear activation. The ensemble model concatenates outputs from these autoencoders into a unified latent vector, which is further compressed via a dense layer.

System components used in the evaluation setup:

Neural Networks: LSTM, GRU, and Stacked Autoencoder
Operating System: Windows 11
Programming Language: Python v3.12
Python Library: Scikit-learn v1.6.0
Natural Language Toolkit (NLTK) Library: WordPunctTokenizer
Evaluation Metric for measuring prediction accuracy: Mean Absolute Error (MAE)
The performance assessment of the proposed model involves the computation of six primary metrics: accuracy, precision, sensitivity, detection rate, false positive rate, and F1 score.
The proposed ensemble model consistently outperforms the individual sub-models across all evaluation metrics.

Discussion

The proposed model demonstrates strong performance across all evaluation metrics, particularly in terms of the False Positive Rate (FPR).
The model achieves the lowest FPR compared to related work referenced in this study, highlighting its ability to accurately differentiate between normal and malicious web requests.

Effectiveness of the Ensemble Model

The proposed ensemble model comprises three fundamental sub-models: LSTM, GRU, and Stacked Autoencoder.
- LSTM Autoencoder: Captures long-term dependencies in web request sequences, enhancing the ability to recognize complex request structures.
- GRU Autoencoder: Provides computational efficiency while preserving strong sequential pattern recognition capabilities.
- Stacked Autoencoder: Focuses on dimensionality reduction and latent feature extraction, improving anomaly detection in subtle attack patterns.
By integrating these sub-models, the ensemble model achieves a well-balanced performance across multiple evaluation metrics, effectively mitigating the weaknesses of individual sub-models by an innovative concatenation and feature-compression mechanism.
This structured ensemble method distinctly differs from traditional ensemble strategies and clearly enhances computational efficiency and detection accuracy.

Practical Considerations and Limitations

Although the proposed model demonstrates robust performance on benchmark datasets, its real-world deployment in web security applications requires additional considerations.

Conclusions

Each web request was initially segmented into individual words, tokenized using a predefined vocabulary, mapped to a unique numerical representation, then facilitated its input into the neural network.
The proposed model employs an ensemble approach comprising three sub-models: LSTM, GRU, and Stacked Autoencoders. The outputs are explicitly concatenated into a combined latent feature set and ensure the ensemble benefits from the diverse representation capabilities of each sub-model.
A novel structured tokenization method significantly enhances detection performance, and explicit evaluation of critical metrics including False Positive Rate.
The Mean Absolute Error (MAE) was employed as the primary metric to quantify the difference between the reconstructed and original values of each request.

Future Work

Future research involves enhancing the tokenization and feature extraction process across diverse web attack datasets to be achieved through the application of Generative AI, leveraging Large Language Models (LLMs) and prompt engineering to construct a structured prompt that systematically guides the LLM in preprocessing each dataset sample.
Automating the preprocessing pipeline by generating customized script code leverages LLMs to autonomously generate preprocessing scripts based on structured prompts that define tokenization rules and feature extraction strategies.
The implementation of advanced neural architectures and anomaly detection approaches, such as Bidirectional LSTM, GRU, and Convolutional Neural Networks (CNNs), to develop a more robust ensemble model for web attack detection.
Integration of structured reasoning strategies to detect complex, multi-component web vulnerabilities.
Furthermore, the integration of a voting-based ensemble learning approach will be investigated to improve detection accuracy by combining multiple classifiers.