arcee-ai/Virtuoso-Small-v2

Warm
Public
14.8B
FP8
131072
Jan 30, 2025
License: apache-2.0
Hugging Face
Overview

Virtuoso-Small-v2: Deepseek-v3 Distillation

Virtuoso-Small-v2 is a 14.8 billion parameter language model developed by arcee-ai, based on the Qwen-2.5-14B architecture. It distinguishes itself through a unique distillation process from Deepseek-v3, utilizing over 5 billion tokens worth of logits. This method, which includes "tokenizer surgery" for cross-architecture compatibility and proprietary "fusion merging," aims for precise knowledge transfer rather than standard supervised fine-tuning.

Key Capabilities

  • Advanced Reasoning: Excels in technical and scientific queries.
  • Complex Code Generation: Optimized for generating intricate code.
  • Mathematical Problem-Solving: Demonstrates strong performance in mathematical tasks.
  • Extended Context: Supports a context length of 128k tokens.

Training Highlights

  • Logit-Level Distillation: Trained on approximately 1.1 billion tokens of Deepseek-v3 logits.
  • Fusion Merging: Employs a specialized merging technique to maximize fidelity to the teacher model.
  • Alignment: Includes DPO (Direct Preference Optimization) to enhance alignment and reduce hallucinations.

This model is released under the Apache-2.0 License, allowing for broad commercial and non-commercial use.