Name: nbeerbower/bophades-mistral-math-DPO-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nbeerbower

Overview

nbeerbower/bophades-mistral-math-DPO-7B is a 7 billion parameter language model derived from the bophades-v2-mistral-7B base model. It has been specifically fine-tuned using Direct Preference Optimization (DPO) on the kyujinpy/orca_math_dpo dataset. This targeted training aims to significantly improve its capabilities in mathematical reasoning and problem-solving.

Key Capabilities

Enhanced Mathematical Reasoning: Specialized training on a math-focused DPO dataset. This model is designed to perform better on tasks requiring numerical and logical computation.
DPO Fine-tuning: Utilizes Direct Preference Optimization, a method known for aligning models with human preferences, which can lead to more accurate and helpful responses in its specialized domain.
Mistral Architecture: Built upon the Mistral-7B architecture, providing a strong foundation for general language understanding while being optimized for a niche application.

Training Details

The model was fine-tuned using an A100 GPU on Google Colab. The training involved LoRA configuration with r=16 and lora_alpha=16, targeting key attention and feed-forward modules. Training arguments included a learning rate of 2e-5, 420 maximum steps, and gradient_checkpointing enabled, with a max_prompt_length of 1024 and max_length of 1536 for the DPO trainer.

Good For

Applications requiring accurate mathematical problem-solving.
Tasks involving numerical reasoning and logical deduction.
Developers looking for a Mistral-based model with specialized math capabilities.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)