Name: W-61/qwen3-8b-base-cpo-ultrafeedback-4xh200-batch-128-20260422-131855 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: W-61

Overview

This model, qwen3-8b-base-cpo-ultrafeedback-4xh200-batch-128-20260422-131855, is an 8 billion parameter language model developed by W-61. It is a fine-tuned variant of the W-61/qwen3-8b-base-sft-ultrachat-4xh200-batch-128 base model, specifically trained using a Constitutional Preference Optimization (CPO) method.

Training Details

The model was fine-tuned on the HuggingFaceH4/ultrafeedback_binarized dataset. Key training hyperparameters included a learning rate of 5e-07, a total batch size of 128 (with a train batch size of 4 and gradient accumulation steps of 8), and a cosine learning rate scheduler with a 0.1 warmup ratio. The training consisted of 1 epoch.

Evaluation Results

On the evaluation set, the model achieved a loss of 2.0046. Notable reward metrics include a rewards accuracy of 0.5280 and a rewards margin of -0.1083, indicating its performance in distinguishing between preferred and rejected responses based on the feedback dataset.

Intended Use

While specific intended uses and limitations require further information, the CPO fine-tuning on a feedback dataset suggests its suitability for tasks where aligning with human preferences and generating high-quality, preferred responses is critical. Its 32768-token context length supports processing longer inputs.

Overview

Overview

Training Details

Evaluation Results

Intended Use

Full Model Card (README)