ForensicSQL-Llama-3.2-3B: Specialized Text-to-SQL for Digital Forensics
ForSQLiteLM is a Llama 3.2-3B model, fine-tuned by pawlaszc, specifically designed to translate natural language requests into SQLite queries for mobile forensic databases. This model is integrated into the open-source forensic analysis tool FQLite.
Key Capabilities & Performance
- High Accuracy: Achieves 93.0% execution accuracy on a 100-example held-out test set, closely matching GPT-4o's 95.0% under identical conditions, with a significant improvement of +56 percentage points over the base Llama model.
- Local Operation: Runs entirely locally without internet connectivity, crucial for sensitive forensic investigations.
- Broad Forensic Coverage: Generates queries for 191 forensic artifact categories, including WhatsApp, Signal, iMessage, Android SMS, iOS Health, WeChat, Instagram, and blockchain wallets.
- Targeted Fine-tuning: Full fine-tune on the SQLiteDS dataset (800 training examples) using Hugging Face Transformers, resulting in a model size of approximately 6 GB (bf16).
- Performance Breakdown: Matches GPT-4o on 'Easy' (95.1%) and 'Medium' (87.5%) difficulty queries, with the primary gap on 'Hard' queries (88.9%) involving complex CTEs and window functions.
Intended Use Cases
- Mobile Forensics: Automating SQL query drafting for seized device databases.
- Tool Integration: Designed for integration into forensic tools like FQLite, Autopsy, ALEAPP/iLEAPP.
- Research & Education: Valuable for domain-specific Text-to-SQL research and learning forensic database analysis.
Important Considerations
- Drafting Assistant: ForSQLiteLM is a drafting assistant, not a replacement for human SQL expertise. Approximately 1 in 14 queries may contain errors, requiring expert review and validation for critical work.
- Scope: Specialized for SQLite databases within the forensic domain; not intended for general-purpose SQL generation or other database types.