Metabird-7B: A Math-Optimized Mistral-Derived Model
ConvexAI/Metabird-7B is a 7 billion parameter language model built upon the Mistral architecture, specifically fine-tuned from leveldevai/TurdusBeagle-7B. Its primary differentiation lies in its optimization for mathematical reasoning, achieved through training on the shuyuej/metamath_gsm8k dataset.
Key Capabilities & Performance
This model exhibits notable performance across various benchmarks, particularly in reasoning and common sense tasks. On the Open LLM Leaderboard, Metabird-7B achieves an average score of 71.03.
- AI2 Reasoning Challenge (25-Shot): 69.54
- MMLU (5-Shot): 65.27
- GSM8k (5-Shot): 62.85 (a strong indicator of mathematical problem-solving ability)
- HellaSwag (10-Shot): 87.54
- Winogrande (5-shot): 83.03
Training Details
The model was trained using axolotl with a sequence length of 8192 tokens and a learning rate of 5e-06 over 1 epoch. The training process involved a total batch size of 8 with gradient accumulation steps of 4, utilizing bf16 precision and Flash Attention for efficiency.
Good for
- Applications requiring strong mathematical reasoning.
- Tasks involving logical problem-solving and quantitative analysis.
- Use cases where a 7B parameter model with enhanced reasoning capabilities is beneficial.