Name: Hi-Satoh/adv_MoE_ALF_sft3_merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Hi-Satoh

Overview

This model, Hi-Satoh/adv_MoE_ALF_sft3_merged, is a 4 billion parameter language model derived from Qwen/Qwen3-4B-Instruct-2507. It has undergone fine-tuning using Direct Preference Optimization (DPO), implemented with the Unsloth library, to align its outputs with preferred response patterns.

Key Capabilities

Enhanced Reasoning: Optimized to improve Chain-of-Thought (CoT) reasoning, making it suitable for tasks requiring logical progression and structured thinking.
Improved Response Quality: Focuses on generating higher-quality, more aligned structured responses based on preference datasets.
Full-Merged Weights: Provided as full-merged 16-bit weights, eliminating the need for adapter loading.

Training Details

The model was trained for 2 epochs with a learning rate of 1e-06 and a beta value of 0.05. The maximum sequence length used during training was 4096 tokens. The LoRA configuration (r=8, alpha=16) was merged into the base model.

Good For

Applications requiring models with improved reasoning abilities.
Scenarios where structured and aligned responses are critical.
Developers looking for a Qwen3-4B variant with DPO-enhanced performance.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)