Name: Vikhrmodels/QVikhr-2.5-1.5B-Instruct-SMPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Vikhrmodels

QVikhr-2.5-1.5B-Instruct-SMPO Overview

QVikhr-2.5-1.5B-Instruct-SMPO is a 1.5 billion parameter instruction-following language model from Vikhrmodels, built upon the Qwen-2.5-1.5B-Instruct architecture. Its primary distinction lies in its specialized alignment process using Simple Margin Preference Optimization (SMPO), a method designed to improve the stability and control of preference training, particularly in conjunction with Rejection Sampling.

Key Capabilities & Training:

Bilingual Support: Optimized for Russian (RU) language tasks, while also supporting English (EN).
Advanced Alignment: Utilizes SMPO for fine-tuning, a technique developed by Vikhrmodels to enhance response quality through preference optimization.
Training Data: Aligned on a high-quality, deduplicated subset of the GrandMaster-PRO-MAX Russian dataset (approximately 10k dialogues).
Reward Model: Leveraged Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 as the reward model during the alignment process.
Rejection Sampling: Employed rejection sampling with 7 hypotheses generated from the Vikhr-Qwen-2.5-1.5B-Instruct SFT checkpoint to create the preference dataset.

Good For:

Applications requiring a compact (1.5B) yet capable model for Russian language generation and instruction following.
Use cases where improved response quality and alignment stability are critical, benefiting from the SMPO methodology.
Developers interested in exploring models fine-tuned with advanced preference optimization techniques for bilingual (RU/EN) contexts.

Overview

QVikhr-2.5-1.5B-Instruct-SMPO Overview

Key Capabilities & Training:

Good For:

Full Model Card (README)