Name: Taichi11/sft_v7_dpo_v2_merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Taichi11

Overview

Taichi11/sft_v7_dpo_v2_merged is a 4 billion parameter language model developed by Taichi11, built upon the Taichi11/LLM_main_v7_merged base model. It has been fine-tuned using Direct Preference Optimization (DPO) via the Unsloth library, with its full-merged 16-bit weights available directly, eliminating the need for adapter loading.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought reasoning abilities.
Structured Output Quality: Specifically aligned to produce higher quality structured responses based on preference datasets.
Direct Use: Provided as a fully merged model, ready for immediate deployment with transformers.

Good For

Applications requiring models with improved logical reasoning steps.
Use cases where generating well-structured and precise outputs is critical.
Developers seeking a DPO-optimized model for better response alignment without complex setup.

Training Details

The model underwent DPO training for 1 epoch with a learning rate of 1e-07 and a beta value of 0.1. It utilized a maximum sequence length of 1024 during training, with LoRA configurations (r=8, alpha=16) merged into the base model. The training data used was Taichi11/dpo_dataset_v1.

Overview

Overview

Key Capabilities

Good For

Training Details

Full Model Card (README)