cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct
The cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct is a 0.5 billion parameter small language model (SLM) based on the Qwen2.5-Coder-0.5B-Instruct architecture, developed by Lei Sheng and Shuai-Shuai Xu. It is specifically fine-tuned for Text-to-SQL tasks, leveraging supervised fine-tuning and reinforcement learning with a corrective self-consistency approach. This model excels at translating natural language questions into SQL queries, achieving 56.87% execution accuracy on the BIRD development set, making it suitable for efficient Text-to-SQL applications where inference speed and edge deployment are critical.
Loading preview...
SLM-SQL: Small Language Models for Text-to-SQL
This model, CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct, is part of the SLM-SQL project by Lei Sheng and Shuai-Shuai Xu, focusing on enhancing small language models (SLMs) for Text-to-SQL tasks. While traditional SLMs (0.5B-1.5B parameters) often struggle with the logical reasoning required for SQL generation, this project demonstrates significant improvements through advanced post-training techniques.
Key Capabilities and Features
- Text-to-SQL Translation: Optimized to convert natural language questions into accurate SQL queries.
- Base Model: Built upon the
Qwen2.5-Coder-0.5B-Instructarchitecture, providing a strong foundation for code-related tasks. - Advanced Training: Utilizes supervised fine-tuning (SFT) and reinforcement learning (GRPO) on specialized datasets like SynSQL-Think-916K and SynSQL-Merge-Think-310K.
- Corrective Self-Consistency: Employs an inference method that enhances SQL generation accuracy by iteratively refining queries.
- Performance: The 0.5B model achieved 56.87% execution accuracy (EX) on the BIRD development set, demonstrating substantial improvement over baseline SLMs.
Why This Model is Different
Unlike larger LLMs that perform well on Text-to-SQL but demand significant computational resources, SLM-SQL focuses on achieving competitive performance with a much smaller footprint. This model specifically addresses the limitations of SLMs in logical reasoning for SQL by applying targeted fine-tuning and a corrective self-consistency approach, making it highly efficient for scenarios requiring faster inference and deployment on edge devices.
Good for Use Cases Involving
- Efficient Text-to-SQL: Applications where converting natural language to SQL needs to be fast and resource-light.
- Edge Deployment: Scenarios where models need to run on devices with limited computational power.
- Database Interaction: Building interfaces that allow users to query databases using natural language without relying on large, complex models.
- Research in SLM Capabilities: Exploring the potential of small models in complex reasoning tasks.