nbeerbower/bophades-mistral-math-DPO-7B
The nbeerbower/bophades-mistral-math-DPO-7B is a 7 billion parameter causal language model, fine-tuned from the bophades-v2-mistral-7B base model using Direct Preference Optimization (DPO). This model is specifically optimized for mathematical reasoning tasks, leveraging the kyujinpy/orca_math_dpo dataset. It is designed to enhance performance in solving mathematical problems and related logical operations.
Loading preview...
Overview
nbeerbower/bophades-mistral-math-DPO-7B is a 7 billion parameter language model derived from the bophades-v2-mistral-7B base model. It has been specifically fine-tuned using Direct Preference Optimization (DPO) on the kyujinpy/orca_math_dpo dataset. This targeted training aims to significantly improve its capabilities in mathematical reasoning and problem-solving.
Key Capabilities
- Enhanced Mathematical Reasoning: Specialized training on a math-focused DPO dataset. This model is designed to perform better on tasks requiring numerical and logical computation.
- DPO Fine-tuning: Utilizes Direct Preference Optimization, a method known for aligning models with human preferences, which can lead to more accurate and helpful responses in its specialized domain.
- Mistral Architecture: Built upon the Mistral-7B architecture, providing a strong foundation for general language understanding while being optimized for a niche application.
Training Details
The model was fine-tuned using an A100 GPU on Google Colab. The training involved LoRA configuration with r=16 and lora_alpha=16, targeting key attention and feed-forward modules. Training arguments included a learning rate of 2e-5, 420 maximum steps, and gradient_checkpointing enabled, with a max_prompt_length of 1024 and max_length of 1536 for the DPO trainer.
Good For
- Applications requiring accurate mathematical problem-solving.
- Tasks involving numerical reasoning and logical deduction.
- Developers looking for a Mistral-based model with specialized math capabilities.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.