Name: seeklhy/OmniSQL-14B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: seeklhy

OmniSQL-14B: High-Quality Text-to-SQL Model

OmniSQL-14B is a 14.8 billion parameter model developed by seeklhy, specifically designed for text-to-SQL tasks. It is built upon an automatic and scalable data synthesis framework, leveraging the SynSQL-2.5M dataset, which comprises over 2.5 million diverse and high-quality text-to-SQL samples across more than 16,000 databases. The model's fine-tuning also incorporates data from established benchmarks like Spider and BIRD.

Key Capabilities

SQL Generation: Translates natural language questions into valid SQL queries, primarily for the SQLite dialect.
High Accuracy: Achieves strong performance on standard and challenging text-to-SQL benchmarks (e.g., Spider, BIRD, Spider2.0-SQLite, ScienceBenchmark, EHRSQL, Spider-DK, Spider-Syn, Spider-Realistic).
Robustness: Evaluated across various robustness benchmarks, demonstrating consistent performance.
Chain-of-Thought: Benefits from chain-of-thought solutions included in its training data, aiding in complex query generation.
Scalability: Part of a model family (7B, 14B, 32B) built on a large-scale synthetic dataset, allowing for further fine-tuning with custom data.

What Makes It Different?

OmniSQL-14B distinguishes itself through its training on the massive, synthetically generated SynSQL-2.5M dataset, which provides unparalleled diversity in database schemas, SQL complexity, and linguistic styles. This allows it to significantly outperform baseline LLMs of similar scale and, in many cases, surpass larger models like GPT-4o and DeepSeek-V3 on text-to-SQL tasks, without requiring additional design elements like schema linking or SQL revision. Its focus on the SQLite dialect makes it highly specialized for applications using this database engine.

Limitations

Currently, OmniSQL-14B is primarily focused on English and the SQLite database engine. Its performance in multi-language or multi-SQL dialect scenarios may be limited. However, the underlying framework allows for synthesizing new data to adapt the model to different requirements.

Good For

Developers needing to convert natural language into SQLite SQL queries.
Applications requiring high accuracy in text-to-SQL translation.
Researchers exploring synthetic data generation for LLM fine-tuning.
Building intelligent database interfaces and query assistants.

Overview

OmniSQL-14B: High-Quality Text-to-SQL Model

Key Capabilities

What Makes It Different?

Limitations

Good For

Full Model Card (README)