Name: jackf857/qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64-20260424-040415 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Overview

This model, jackf857/qwen3-8b-base-epsilon-dpo-hh-harmless-4xh200-batch-64-20260424-040415, is an 8 billion parameter language model developed by jackf857. It is a fine-tuned variant of jackf857/qwen3-8b-base-sft-hh-harmless-4xh200-batch-64-20260417-214452, specifically enhanced through Direct Preference Optimization (DPO).

Key Capabilities

Preference Alignment: Fine-tuned on the Anthropic/hh-rlhf dataset, indicating a strong focus on aligning with human preferences and generating harmless outputs.
DPO Training: Utilizes Direct Preference Optimization, a method known for improving model behavior based on human feedback without complex reinforcement learning setups.
Performance Metrics: Achieved a rewards accuracy of 0.7328 on its evaluation set, with a rewards margin of 0.3976, demonstrating its effectiveness in distinguishing preferred responses.
Context Window: Supports a context length of 32768 tokens, suitable for processing and generating longer sequences of text.

Good For

Applications requiring models that prioritize harmlessness and safety in their responses.
Use cases where human preference alignment is critical, such as chatbots, content moderation, or interactive AI systems.
Developers looking for a model fine-tuned with DPO for improved behavioral characteristics.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)