Vikhrmodels/QVikhr-2.5-1.5B-Instruct-SMPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 31, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

QVikhr-2.5-1.5B-Instruct-SMPO is a 1.5 billion parameter instruction-tuned causal language model developed by Vikhrmodels, based on Qwen-2.5-1.5B-Instruct. It is specialized for Russian language tasks while supporting bilingual RU/EN interactions, and has been aligned using Simple Margin Preference Optimization (SMPO) on the GrandMaster-PRO-MAX dataset to enhance response quality.

Loading preview...

QVikhr-2.5-1.5B-Instruct-SMPO Overview

QVikhr-2.5-1.5B-Instruct-SMPO is a 1.5 billion parameter instruction-following language model from Vikhrmodels, built upon the Qwen-2.5-1.5B-Instruct architecture. Its primary distinction lies in its specialized alignment process using Simple Margin Preference Optimization (SMPO), a method designed to improve the stability and control of preference training, particularly in conjunction with Rejection Sampling.

Key Capabilities & Training:

  • Bilingual Support: Optimized for Russian (RU) language tasks, while also supporting English (EN).
  • Advanced Alignment: Utilizes SMPO for fine-tuning, a technique developed by Vikhrmodels to enhance response quality through preference optimization.
  • Training Data: Aligned on a high-quality, deduplicated subset of the GrandMaster-PRO-MAX Russian dataset (approximately 10k dialogues).
  • Reward Model: Leveraged Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 as the reward model during the alignment process.
  • Rejection Sampling: Employed rejection sampling with 7 hypotheses generated from the Vikhr-Qwen-2.5-1.5B-Instruct SFT checkpoint to create the preference dataset.

Good For:

  • Applications requiring a compact (1.5B) yet capable model for Russian language generation and instruction following.
  • Use cases where improved response quality and alignment stability are critical, benefiting from the SMPO methodology.
  • Developers interested in exploring models fine-tuned with advanced preference optimization techniques for bilingual (RU/EN) contexts.