Name: FinaPolat/RAISED_QWEN_8B_DPO_1Krandom API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: FinaPolat

Model Overview

FinaPolat/RAISED_QWEN_8B_DPO_1Krandom is an 8 billion parameter language model developed by FinaPolat. It is a Qwen3-based model, specifically fine-tuned from the FinaPolat/RAISED_QWEN_8B_SFT checkpoint. This model leverages Direct Preference Optimization (DPO) with 1K random samples, indicating a focus on aligning model outputs with human preferences.

Training Details

A notable aspect of this model is its training methodology. It was fine-tuned using Unsloth, a library designed to accelerate the training of large language models, achieving a 2x speed improvement. The fine-tuning process also incorporated Huggingface's TRL (Transformer Reinforcement Learning) library, which is commonly used for alignment techniques like DPO.

Key Characteristics

Architecture: Qwen3
Parameter Count: 8 billion
Context Length: 32768 tokens
Fine-tuning Method: Direct Preference Optimization (DPO)
Training Efficiency: Utilizes Unsloth for faster training.

Potential Use Cases

Given its DPO fine-tuning and Qwen3 base, this model is suitable for a variety of general-purpose language generation and understanding tasks where alignment with human preferences is beneficial. Its efficient training process suggests a focus on practical deployment and iterative improvement.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)