Mistral-Nemo-12B-Text-to-SQL Overview
This model, developed by NBAmine, is a 12.2 billion parameter Mistral-Nemo variant specifically fine-tuned for Text-to-SQL generation. It converts natural language questions into executable SQL queries, providing DDL context. The model is a full-precision (BF16) merged version, representing the peak performance before quantization.
Key Capabilities
- Natural Language to SQL Generation: Translates complex natural language queries into standard SQL.
- DDL Context Understanding: Utilizes database schema (DDL) to generate accurate and contextually relevant SQL.
- Curriculum Learning: Trained using a two-phase strategy, first focusing on SQL syntax and basic schema mapping, then advancing to complex reasoning tasks like multiple
JOIN operations and nested subqueries. - High Accuracy: Achieves 69.5% Execution Accuracy (EX) on the challenging Spider validation set.
Training and Architecture
The model was fine-tuned using QLoRA (Rank 16, Alpha 32) with 4-bit NF4 quantization during training. It leverages the standard Mistral-Nemo 12B architecture, featuring 40 layers and Grouped Query Attention (GQA) with 8 KV heads. The maximum context length supported is 2048 tokens.
Good For
- Applications requiring robust and accurate conversion of natural language into SQL queries.
- Developers looking for a high-performance Text-to-SQL model as a "Source of Truth" for further optimizations.