arcee-ai/Virtuoso-Medium-v2

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 27, 2025License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

arcee-ai/Virtuoso-Medium-v2 is a 32.8 billion parameter language model distilled from Deepseek-v3, leveraging an expanded dataset of over 5 billion tokens worth of logits. Built upon the Qwen-2.5-32B architecture, it utilizes a unique logit-level distillation and fusion merging approach for precise knowledge transfer. This model excels in advanced reasoning, complex code generation, and mathematical problem-solving, often surpassing other 30B+ and some 70B+ models in specific tasks. It is intended for advanced chatbots, enterprise data analysis, research simulations, and educational tools in STEM fields.

Loading preview...

Virtuoso-Medium-v2: Deepseek-v3 Distilled Language Model

Virtuoso-Medium-v2 is a 32-billion-parameter language model developed by arcee-ai, building on the original Virtuoso-Medium architecture. It is distilled from Deepseek-v3, utilizing an expanded dataset of over 5 billion tokens worth of logits and based on the Qwen-2.5-32B architecture. This model employs a unique logit-level distillation and "fusion merging" approach, rather than standard supervised fine-tuning, to ensure precise knowledge transfer from the teacher model.

Key Capabilities

  • Advanced Reasoning: Excels in technical, scientific, and mathematical problem-solving.
  • Complex Code Generation: Optimized for generating intricate code.
  • High Performance: Achieves strong benchmark scores, often surpassing other 30B+ and some 70B+ models in specific tasks.
  • Unique Distillation: Leverages Deepseek-v3's expertise through logit-level replication and specialized tokenizer surgery for cross-architecture compatibility.

Intended Use Cases

  • Advanced Chatbots & Virtual Assistants
  • Enterprise Data Analysis & Workflow Automation
  • Research Simulations & Natural Language Understanding
  • Educational Tools for STEM Fields

Limitations

  • Context Length: Supports up to 128k tokens.
  • Knowledge Cut-off: Training data reflects information up to June 2024, potentially lacking more recent developments.