Name: tomofusa/exp034-toml-upsample-dpo-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: tomofusa

Model Overview

The tomoofusa/exp034-toml-upsample-dpo-merged is a 4 billion parameter language model developed by tomofusa. This model is provided with full 16-bit weights, eliminating the need for adapter loading, which simplifies deployment and usage. Its development involved a two-stage training process designed to enhance its performance and alignment with human preferences.

Training Pipeline

The model's training consisted of two distinct phases:

Supervised Fine-Tuning (SFT): The initial phase utilized the tomoofusa/exp034-blend-h-toml-up-lora model as its base, establishing a strong foundation for language understanding and generation.
Direct Preference Optimization (DPO): Following SFT, the model underwent DPO using the u-10bei/dpo-dataset-qwen-cot dataset. This phase was crucial for aligning the model's outputs with desired human preferences, with specific configurations:
- Learning rate: 5e-07
- Beta: 0.1
- Loss type: ipo
- LoRA parameters: r=64, alpha=128
- Max length: 1024
- Training duration: 1 epoch

Key Capabilities

This model excels in generating high-quality, preference-aligned text due to its DPO training. Its 32768-token context length allows for processing and generating longer, more coherent responses. The full 16-bit weights ensure robust performance without the overhead of adapter management.

Good For

Applications requiring models that adhere closely to human preferences.
Conversational AI and chatbot development where response quality and alignment are critical.
Instruction-following tasks where nuanced understanding and generation are beneficial.

Overview

Model Overview

Training Pipeline

Key Capabilities

Good For

Full Model Card (README)