Notes on ML in Digital Forensics: A Systematic Literature Review (2010-2021)

Abstract

Topic: Machine learning (ML) applications in digital forensics (DF) explored via a systematic literature review (SLR).
Timeframe: January 2010 – December 2021.
Scope: Review of ML techniques used to analyze digital forensic evidence across DF domains.
Key findings:
- Image forensics yields the greatest benefit from ML methods compared to other DF domains.
- Convolutional neural networks (CNNs) are the most important ML method in DF; CNN-based models are increasingly used.
- A comprehensive mind map and visual analysis (based on paper keywords) show thematic relationships and trends.
Deliverables:
- A meta-analysis of ML method usage in DF.
- A mind map linking ML methods to DF domains.
- Guidelines and representations to guide future work.
Authors & affiliations: Ferdowsi University of Mashhad (Iran); University of Guelph (Canada).
Keywords: Digital forensic, Machine learning, Convolutional neural networks, Image forensics, Deep learning, SLR.

Introduction

DF challenges:
- Exponential growth in digital data and cybercrime complexity.
- Large data volumes make investigations time-consuming; difficulty in defining useful big-data sets.
- Data diversity (e.g., IoT with billions of sensors) complicates real-time investigations.
- Need for accuracy and reliability in investigations and results.
Motivation for ML in DF:
- ML can extract knowledge from huge evidence datasets, enable data mining, and support automated, faster, and more capable investigations.
- ML helps identify anomalies, patterns, and relationships to aid investigators in handling large volumes of data.
- DL models (a subset of ML) are used in adversarial image forensics, image tamper detection, computer forensics, and network traffic analysis.
Goals of the study:
- Provide a community-wide view of ML applications in DF.
- Identify ML techniques and DF domains that have received the most attention.
- Highlight research gaps and future directions.

Prior research and background

Previous work overview:
- Quick & Raymond Choo (papers since 2004) on large-data DF challenges and AI/ML solutions.
- Pratama et al. (2014) on DF trends and computational intelligence effects.
- Faye Rona Mitchell (2014) on AI techniques in pattern recognition for cybersecurity/DF.
- Qadir & Varol (2020) on ML for analyzing large, diverse DF datasets to detect criminal behavior.
- Adam & Varol (2020) on literature 2005–2019 using classification and clustering in DF and a framework for DF intelligence.
Limitations of prior works:
- Many studies focused on specific ML techniques or single DF domains.
- No comprehensive systematic literature review covering ML applications to DF across all domains up to 2021.
The current study positions itself as a comprehensive SLR to fill these gaps, including a cross-domain view and a mind map to map ML methods to DF domains.

Research questions and contributions

RQs addressed:
- RQ1: How have ML-for-DF publications trended yearly? (Publication spread over years.)
- RQ2: What is the geographic distribution of ML-for-DF research? (Leading countries.)
- RQ3: Which venues and databases publish ML-for-DF work most? (Publishers, conferences, journals.)
- RQ4: What keywords co-occur, indicating major research topics and relationships?
- RQ5: What ML methods are used in DF, and in which DF domains?
Primary contributions:
- Identification of 608 primary ML-for-DF studies up to December 2021.
- Meta-analysis of ML methods used to improve DF investigations and address DF challenges.
- Visual analysis of 608 papers by authors’ keywords to reveal thematic relevance.
- A comprehensive mind map linking ML methods to DF domains to guide further work.
- Representations and guidelines to help advance research in this area.

Structure and methodology of the SLR

Methodology framework: Kitchenham & Charters guidance for SLRs.
Process stages:
- Planning, conducting, reporting phases; iterative refinement toward evaluation.
Primary studies collection:
- Search platforms: ACM Digital Library, IEEE Xplore, ScienceDirect, SpringerLink.
- Search terms sought ML and DF crossovers, using AND/OR operators:
- The search terms included combinations like ("machine learning" OR "artificial intelligence" OR "classification") AND "digital forensic" AND ("neural network" OR "convolutional neural network" OR "deep neural network" OR "deep Learning") AND ("support vector machine" OR bayesian OR regression OR "decision tree" OR "k-nearest neighbor" OR supervised OR "k-means" OR reinforcement OR "Markov" OR "random forest") AND "digital forensic".
Data handling: Forward and backward snowballing (Wohlin) used to expand/verify results until saturation.
Inclusion criteria (highlights):
- English, peer-reviewed, empirical data on ML in DF, published 2010–2021.
- Papers must discuss ML technologies used to improve DF.
Exclusion criteria (highlights):
- Grey literature (blogs, government docs), non-English works, non-empirical, not peer-reviewed.
Data extraction: Studies passing quality assessment were categorized into:
- Context data: study goals.
- Qualitative data: findings, conclusions.
- Quantitative data: experimental results.
Documentation: Data stored in a spreadsheet for completeness and accuracy; meta-analysis performed on extracted data.

Data and results: quantitative findings

Search and filtering outcomes:
- Initial keyword search yielded $6781$ studies.
- Duplicates removed left $4521$ .
- Post- screening with inclusion/exclusion: $605$ papers remained.
- Full-text review reduced to $533$ studies; snowballing continued until saturation.
- Final set included: $608$ papers.
Yearly trends (RQ1):
- ML-for-DF research grew significantly after 2016, with a sharp rise through 2021.
Geographical distribution (RQ2):
- Leading contributors by country: China, India, and the USA.
- Europe accounted for about 18 ext{%} of studies, indicating relatively lower engagement.
- A group labeled 'others' contained 21 countries with <= 5 publications each (examples: Greece, Turkey, Norway, Pakistan, Japan, Russia, Austria, Vietnam, Canada, etc.).
Publication venues and databases (RQ3):
- Conference proceedings are more active in DF ML publications than journals.
- Springer database had the highest publication count; Elsevier accounted for ~47 ext{%} of journal papers.
Keywords and topic mapping (RQ4):
- Keyword network analysis performed with VOSviewer to identify co-occurrence patterns.
- Significant keywords (counts > 20) included: Image, Detection, CNN, Forensic(s), Identification, DL, Forgery, SVM, classification, Video, Compression, JPEG, Camera, Splicing, Audio, Computer, Multimedia.
- Density visualization and color-coding highlight thematic clusters around multimedia forensics, image tampering, and CNN-based detection.
DF domains and ML techniques (RQ5):
- Domains considered: Data Discovery and Recovery, Fingerprinting, Multimedia Forensics (Image, Video, Audio, Text), Network, and Triage mode.
- A comprehensive mind map (Figure 10) links ML techniques to DF domains based on paper context, not just keywords.
- 62% of works related to image forensics; video ~11%; audio and fingerprinting ~7% each; others include sensor, computational, and location forensics.
- ML methods by frequency (based on keywords): Deep learning (DL) and CNN-based methods are dominant; SVM, tree-based, and NN models are effective across many DF categories; K-means used less often.
- CNN-based methods dominate overall and particularly in image forensics; DL has grown noticeably since 2017.
- In 2021, approximately 53 ext{%} of papers used CNN-based methods; roughly 50 ext{%} focused on image forensics.
Summary interpretation:
- CNNs are the prevailing technology in ML-for-DF research, especially for image-related tasks such as manipulation detection, splicing, and forgery localization.
- Traditional ML methods (SVM, RF, DT, KNN) remain relevant across various DF tasks, including camera/source attribution and network security contexts.
- There is an emphasis on evidence acquisition and detection phases; less emphasis on reconstruction/analysis stages.

Mind map and visual analyses

Mind map (Figure 10) links ML methods to DF domains, capturing context beyond simple keyword associations.
Key insights from the mind map:
- CNNs connect strongly to image forensics and manipulation detection tasks.
- SVMs and tree-based methods appear across multiple DF domains (e.g., camera/source attribution, printer/source attribution, social media analysis).
- Data types driving ML usage include image, video, audio, and text, with image data dominating.
- Emerging trends include social media source identification and author attribution in textual/image data.

Detailed findings by forensic domain and ML technique (illustrative highlights)

Multimedia forensics (image, video, audio, text):
- Image manipulation detection: CNNs excel at automatic feature extraction and classification; DL models face challenges with blind/different data—mitigated by transfer learning, CNN patch strategies, and frequency-domain analyses.
- Splicing, copy-move, double compression, resampling: CNNs with multi-scale kernels and transfer learning improve performance; SVM and hand-crafted features still used in some studies.
- Video forensics: CNNs + temporal models (e.g., CNN-LSTM hybrids) help distinguish fake vs. genuine videos; frame-level anomaly detection and compression-parameter estimation are common tasks.
- Audio forensics: CNNs and NN-based approaches robust to noise; some works use stacked autoencoders for waveform feature learning; SVM can outperform DL in certain audio tamper cases.
Camera/mobile source identification (image/video):
- CNNs automatically extract camera fingerprints; SVMs used for classification and pre-processing tasks like removing scene content to reveal fingerprints.
- Source identification without model knowledge: KNN with self-training can outperform binary/multiclass SVM in some settings.
- Social media channel artifacts (filters, SPN) can complicate detection; CNNs can struggle with post-processing but can be aided by SPN-focused features.
Printer/scanner source identification, fingerprinting, and authorship/forensics:
- SVMs, RFs, and KNNs frequently used for printer/source attribution and fingerprinting based on texture and residual features.
- Authorship attribution (text/code): DL-based approaches achieve high accuracy (e.g., near 97% in some language-attribution tasks); syntax trees vs. deep learning show trade-offs in complexity.
- Keystroke dynamics as biometric evidence: radial basis function networks and other NN variants show promise but data availability is a limitation.
Attack and malware detection / network forensics:
- DL approaches (CNNs, MLPs, autoencoders) useful for large, high-dimensional traffic data; one-class SVM/semi-supervised methods help in anomaly detection.
- Tree-based ensembles (boosted trees, RF) often balance performance with interpretability and computational efficiency.
Data discovery, data carving, and file-type identification:
- CNN-based implicit feature extraction improves run-time performance in classification tasks; extreme learning machines (ELMs) and SVMs used for JPEG vs non-JPEG, fragment classification, and file-type detection.
- Data-wiping and evidence recovery: RF and other ensemble methods help identify deleted data with high accuracy; DL approaches handle complex patterns in fragments and encrypted contexts.

Methodological notes and limitations

Data and training challenges:
- Adequate and representative training data is essential; lack of large, diverse DF datasets can hinder generalization.
- Feature selection and the high dimensionality of features pose challenges; optimal feature sets remain domain-specific.
- Large numbers of parameters in deep models require substantial data and careful tuning to avoid overfitting.
Practical limitations:
- Adversarial attacks pose a significant risk to CNN-based methods, underscoring the need for robust and explainable ML in DF.
- DL models often require substantial pre-processing and computational resources; edge deployment and real-time DF contexts require efficient architectures and inference.
Research gaps identified:
- Lack of a standard taxonomy/ontology for DF domains and evidence types; inconsistencies in terminology (e.g., image source identification sometimes classified under different DF categories).
- Under-exploration of DL methods in non-image/video DF domains; more DL work could be valuable beyond image/video forensics.
- A need for explainable ML in DF to provide transparent, auditable decisions for legal contexts.

Future research directions (as proposed by the authors)

Taxonomy and ontology:
- Develop a complete taxonomy of DF domains aligned with evidence types and ML applications to reduce term mismatches.
Security and robustness of ML in DF:
- Investigate the security of CNN-based DF methods against adversarial examples and attacks, and develop robust defenses.
Expansion of DL in non-image/video DF domains:
- Increase DL adoption in data discovery, data reconstruction, network forensics, and other DF domains beyond multimedia.
Expanded ML lifecycle in DF processes:
- More ML involvement in evidence reconstruction/analysis phases, not just acquisition and detection.
Explainable ML (XAI):
- Focus on interpretable models to facilitate legal admissibility and investigator trust.

Conclusion and takeaways

ML, especially CNN-based DL, has become central to modern DF research, with image forensics at the core of this growth.
The SLR identifies 608 primary ML-for-DF studies (through December 2021), highlighting trends, geographic distribution, and venue preferences.
The study provides a comprehensive mind map and keyword-based visual analyses to guide future investigations and collaboration across DF domains.
There is a clear need to address security concerns around ML methods, promote explainability, and broaden DL adoption to non-image DF domains to further advance the field.

Key numerical references and data points (for quick recall)

Total primary studies identified: $608$
Publication window: $2010$ to $2021$
Initial search results: $6781$ studies
After duplicates removal: $4521$
After screening: $605$ papers remained
After full-text review: $533$ studies remained
Final included after snowballing: $608$ papers
Leading contributor countries: China, India, USA
Europe share: 18 ext{%}
Journal vs conference publication dynamics: journals ~47 ext{%} of Elsevier papers; conferences more active overall
Image forensics share of ML-DF works: 62 ext{%}
CNN-based papers (overall): ~53 ext{%} (2021)
CNN dominance in DF: majority of DL/ CNN-related work; DL growth since $2017$
Major ML methods in DF (per author keywords): CNN, SVM, DT, RF, KNN, LR, NB, LSTM, CapsNet, etc.

References to methodology and tools mentioned

Data extraction and categorization: context, qualitative, and quantitative data categories.
Visual analysis tool: VOSviewer used for keyword co-occurrence and density visualizations.
Mind map framework: Figure 10 summarizes relationships between ML methods and DF domains.
Data sources: ACM DL, IEEE Xplore, ScienceDirect, SpringerLink.
Snowballing method: Forward and backward snowballing per Wohlin.