nbeerbower/bophades-mistral-math-DPO-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The nbeerbower/bophades-mistral-math-DPO-7B is a 7 billion parameter causal language model, fine-tuned from the bophades-v2-mistral-7B base model using Direct Preference Optimization (DPO). This model is specifically optimized for mathematical reasoning tasks, leveraging the kyujinpy/orca_math_dpo dataset. It is designed to enhance performance in solving mathematical problems and related logical operations.

Loading preview...

Overview

nbeerbower/bophades-mistral-math-DPO-7B is a 7 billion parameter language model derived from the bophades-v2-mistral-7B base model. It has been specifically fine-tuned using Direct Preference Optimization (DPO) on the kyujinpy/orca_math_dpo dataset. This targeted training aims to significantly improve its capabilities in mathematical reasoning and problem-solving.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specialized training on a math-focused DPO dataset. This model is designed to perform better on tasks requiring numerical and logical computation.
  • DPO Fine-tuning: Utilizes Direct Preference Optimization, a method known for aligning models with human preferences, which can lead to more accurate and helpful responses in its specialized domain.
  • Mistral Architecture: Built upon the Mistral-7B architecture, providing a strong foundation for general language understanding while being optimized for a niche application.

Training Details

The model was fine-tuned using an A100 GPU on Google Colab. The training involved LoRA configuration with r=16 and lora_alpha=16, targeting key attention and feed-forward modules. Training arguments included a learning rate of 2e-5, 420 maximum steps, and gradient_checkpointing enabled, with a max_prompt_length of 1024 and max_length of 1536 for the DPO trainer.

Good For

  • Applications requiring accurate mathematical problem-solving.
  • Tasks involving numerical reasoning and logical deduction.
  • Developers looking for a Mistral-based model with specialized math capabilities.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p