arcee-ai/Virtuoso-Medium-v2
arcee-ai/Virtuoso-Medium-v2 is a 32.8 billion parameter language model distilled from Deepseek-v3, leveraging an expanded dataset of over 5 billion tokens worth of logits. Built upon the Qwen-2.5-32B architecture, it utilizes a unique logit-level distillation and fusion merging approach for precise knowledge transfer. This model excels in advanced reasoning, complex code generation, and mathematical problem-solving, often surpassing other 30B+ and some 70B+ models in specific tasks. It is intended for advanced chatbots, enterprise data analysis, research simulations, and educational tools in STEM fields.
Loading preview...
Virtuoso-Medium-v2: Deepseek-v3 Distilled Language Model
Virtuoso-Medium-v2 is a 32-billion-parameter language model developed by arcee-ai, building on the original Virtuoso-Medium architecture. It is distilled from Deepseek-v3, utilizing an expanded dataset of over 5 billion tokens worth of logits and based on the Qwen-2.5-32B architecture. This model employs a unique logit-level distillation and "fusion merging" approach, rather than standard supervised fine-tuning, to ensure precise knowledge transfer from the teacher model.
Key Capabilities
- Advanced Reasoning: Excels in technical, scientific, and mathematical problem-solving.
- Complex Code Generation: Optimized for generating intricate code.
- High Performance: Achieves strong benchmark scores, often surpassing other 30B+ and some 70B+ models in specific tasks.
- Unique Distillation: Leverages Deepseek-v3's expertise through logit-level replication and specialized tokenizer surgery for cross-architecture compatibility.
Intended Use Cases
- Advanced Chatbots & Virtual Assistants
- Enterprise Data Analysis & Workflow Automation
- Research Simulations & Natural Language Understanding
- Educational Tools for STEM Fields
Limitations
- Context Length: Supports up to 128k tokens.
- Knowledge Cut-off: Training data reflects information up to June 2024, potentially lacking more recent developments.