AbacusResearch/haLLAwa2
AbacusResearch/haLLAwa2 is a 7 billion parameter language model merged from OpenPipe/mistral-ft-optimized-1227 and machinists/Mistral-7B-SQL, utilizing a 4096-token context length. This model is specifically optimized for tasks requiring strong reasoning and SQL capabilities, leveraging its merged architecture. It achieves an average score of 64.44 on the Open LLM Leaderboard, demonstrating proficiency across various benchmarks including MMLU and HellaSwag.
Loading preview...
haLLAwa2: A Merged Mistral-Based Model
haLLAwa2 is a 7 billion parameter language model developed by AbacusResearch, created by merging two distinct Mistral-based models: OpenPipe/mistral-ft-optimized-1227 and machinists/Mistral-7B-SQL. This merge was performed using mergekit with a slerp method, strategically combining their strengths.
Key Capabilities & Performance
This model is designed to offer enhanced performance across a range of tasks, particularly benefiting from the SQL-focused base model. It demonstrates solid reasoning and language understanding capabilities, as evidenced by its evaluation results on the Open LLM Leaderboard:
- Average Score: 64.44
- AI2 Reasoning Challenge (25-Shot): 63.31
- HellaSwag (10-Shot): 84.51
- MMLU (5-Shot): 63.52
- TruthfulQA (0-shot): 47.38
- Winogrande (5-shot): 75.85
- GSM8k (5-shot): 52.08
Unique Merging Strategy
The merge configuration specifically targets different layers for varying merge ratios, with self_attn and mlp layers receiving distinct weighting, and a fallback value for other tensors. This fine-grained control over the merging process aims to preserve and enhance the specialized functionalities of its base models.
Good For
- Applications requiring a balance of general reasoning and specialized SQL understanding.
- Tasks benefiting from a 7B parameter model with a 4096-token context window.
- Developers looking for a model with a strong foundation in Mistral architecture, enhanced through strategic merging.