seeklhy/OmniSQL-32B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Mar 6, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

OmniSQL-32B is a 32.8 billion parameter text-to-SQL model developed by seeklhy, fine-tuned on the 2.5 million sample SynSQL-2.5M dataset and integrated with Spider and BIRD training sets. This model excels at generating high-quality SQL queries from natural language questions, specifically for SQLite databases. It significantly outperforms baseline LLMs and some leading models on various text-to-SQL benchmarks, making it suitable for complex database interaction tasks.

Loading preview...

OmniSQL-32B: Advanced Text-to-SQL Model

OmniSQL-32B is a 32.8 billion parameter model from the OmniSQL family, developed by seeklhy, designed for highly accurate text-to-SQL generation. It is built upon an automatic and scalable data synthesis framework, leveraging the SynSQL-2.5M dataset, which comprises over 2.5 million diverse text-to-SQL samples across 16,000+ databases. The model's training also incorporates high-quality human-labeled data from Spider and BIRD benchmarks.

Key Capabilities

  • High-Quality SQL Generation: Translates natural language questions into complex SQL queries for SQLite databases.
  • Extensive Training Data: Fine-tuned on the largest and most diverse synthetic text-to-SQL dataset to date, SynSQL-2.5M, which includes chain-of-thought (CoT) solutions.
  • Robust Performance: Outperforms similarly sized LLMs and even leading models like GPT-4o and DeepSeek-V3 on various text-to-SQL benchmarks, including Spider, BIRD, and robustness tests.
  • Diverse Query Support: Handles a wide range of SQL complexity levels, from simple single-table queries to advanced multi-table joins and common table expressions.
  • Flexible Linguistic Styles: Processes natural language questions with varied linguistic styles, including formal, colloquial, imperative, and conversational.

Good For

  • Automated Database Interaction: Ideal for applications requiring precise conversion of natural language into SQL queries.
  • Benchmarking and Research: Serves as a strong foundation for further research and fine-tuning in the text-to-SQL domain.
  • SQLite-Specific Applications: Optimized for scenarios involving SQLite databases, given its training on the SQLite dialect.

Limitations

Currently, OmniSQL-32B is primarily focused on English and the SQLite database engine, which may limit its performance in multi-language or multi-SQL dialect environments. However, its underlying framework allows for synthesizing new data to adapt to specific scenarios.