anthracite-org/magnum-v2-72b

Warm
Public
72.7B
FP8
131072
License: tongyi-qianwen
Hugging Face
Overview

Magnum-v2-72b: Claude 3 Prose Quality

Magnum-v2-72b is a 72.7 billion parameter language model developed by Anthracite-org, fine-tuned on the Qwen-2 72B Instruct architecture. Its primary objective is to emulate the sophisticated prose quality found in Claude 3 models (Sonnet and Opus).

Key Capabilities & Features

  • Prose Generation: Optimized for generating high-quality, nuanced, and human-like text, mirroring the style of advanced commercial LLMs.
  • Large Context Window: Features a 131072 token context length, enabling the processing and generation of extensive and complex interactions.
  • Instruction-Tuned: Utilizes ChatML formatting for instruction-tuned interactions, ensuring responsive and coherent dialogue.
  • Robust Training: Fine-tuned over two epochs using 8x AMD Instinctâ„¢ MI300X Accelerators, incorporating techniques like weight decay (0.01) and a peak learning rate of 4e-6 to prevent overfitting.
  • Sample Packing: Employs 16k token sample packing during training, an increase from previous runs, to enhance learning efficiency.

Performance Highlights

Evaluations on the Open LLM Leaderboard show an average score of 41.15. Notable scores include 75.60 on IFEval (0-Shot) and 57.85 on BBH (3-Shot), indicating strong instruction following and reasoning capabilities.

Ideal Use Cases

  • Creative Writing: Generating stories, articles, or other long-form content requiring high prose quality.
  • Advanced Chatbots: Developing conversational agents that produce sophisticated and natural-sounding responses.
  • Content Creation: Assisting with tasks that demand nuanced language and stylistic consistency.