Name: jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/qwen3-8b-base-new-dpo-ultrafeedback-4xh200-batch-128-q_t-0.45-s_star-0.4, is an 8 billion parameter language model developed by jackf857. It is a fine-tuned variant of the jackf857/qwen3-8b-base-sft-ultrachat-4xh200-batch-128 base model, specifically enhanced through Direct Preference Optimization (DPO).

Key Capabilities

DPO Fine-tuning: Optimized using the HuggingFaceH4/ultrafeedback_binarized dataset, which typically improves alignment with human preferences and response quality.
Performance Metrics: Achieved a validation loss of 0.5766, with notable improvements in DPO-specific metrics such as a margin mean of 47.1411 and a beta of 0.0072 on the evaluation set.

Training Details

The model was trained for 1 epoch with a learning rate of 5e-07, utilizing a total batch size of 128 across 4 GPUs. The training employed an AdamW optimizer with a cosine learning rate scheduler and a warmup ratio of 0.1. Frameworks used include Transformers 4.51.0, Pytorch 2.3.1+cu121, Datasets 2.21.0, and Tokenizers 0.21.4.

Potential Use Cases

This model is likely well-suited for applications requiring high-quality, aligned text generation, such as advanced chatbots, content creation, and interactive AI systems where human-like responses are crucial.

Overview

Model Overview

Key Capabilities

Training Details

Potential Use Cases

Full Model Card (README)