NSQL-Llama-2-7B: Specialized Text-to-SQL Model
NSQL-Llama-2-7B is a 7 billion parameter model from the NSQL family, optimized for generating SQL queries from natural language. Built upon Meta's Llama-2 7B, it underwent extensive pre-training on 1 million general SQL queries from The Stack and subsequent fine-tuning on a diverse collection of over 20 public text-to-SQL datasets.
Key Capabilities
- Superior Complex SQL Generation: Significantly outperforms GPT-4 on complex SQL queries, achieving +43% relative improvement on Join queries and +54% relative improvement on Nested queries on the Spider benchmark.
- High Matching Accuracy: Achieves 66.3% matching accuracy on Spider, compared to GPT-4's 41.9%, indicating more structurally correct SQL output.
- Efficiency and Local Deployment: Delivers near-parity with GPT-4 in overall execution accuracy (78.1% vs. 76.2%) while being approximately 250 times smaller, enabling efficient local deployment on commodity hardware for enhanced data privacy.
- Targeted Training: Specifically designed for
SELECT query generation from provided table schemas and natural language prompts.
Good for
- Enterprise SQL Workloads: Ideal for applications involving complex SQL queries with multiple table joins and nested subqueries.
- Privacy-Preserving Applications: Suitable for scenarios requiring local model deployment to maintain data privacy.
- Developers needing specialized SQL generation: Offers a highly accurate and efficient solution for converting natural language to SQL, particularly for intricate database interactions.