Infermatic/magnum-v4-72b-FP8-Dynamic

TEXT GENERATIONConcurrency Cost:4Model Size:72.7BQuant:FP8Ctx Length:32kPublished:Oct 21, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Infermatic/magnum-v4-72b-FP8-Dynamic is a 72.7 billion parameter language model, dynamically quantized to FP8, based on anthracite-org's magnum-v4-72b. This model is fine-tuned on Qwen2.5-72B-Instruct with the goal of replicating the prose quality of Claude 3 models (Sonnet and Opus). It is optimized for generating high-quality, nuanced text, making it suitable for advanced conversational AI and creative writing applications.

Loading preview...

Infermatic/magnum-v4-72b-FP8-Dynamic Overview

This model is a 72.7 billion parameter language model, dynamically quantized to FP8 using AutoFP8, based on the anthracite-org/magnum-v4-72b base model. It is fine-tuned on top of Qwen2.5-72B-Instruct with a primary objective to replicate the prose quality found in Claude 3 models, specifically Sonnet and Opus.

Key Capabilities & Features

  • Claude 3 Prose Quality: Specifically designed and fine-tuned to emulate the high-quality, nuanced prose style of Claude 3 Sonnet and Opus.
  • Dynamic FP8 Quantization: Utilizes dynamic FP8 quantization for efficient inference while maintaining performance.
  • Base Model: Built upon the robust Qwen2.5-72B-Instruct architecture.
  • Extensive Training Data: Fine-tuned using a diverse set of datasets, including anthracite-org/c2_logs_32k_llama3_qwen2_v1.2, anthracite-org/kalo-opus-instruct-22k-no-refusal, and others, focusing on conversational and instructional data.
  • ChatML Prompting: Supports the ChatML format for structured conversations, including system, user, and assistant roles.

Ideal Use Cases

  • Advanced Conversational AI: Excellent for chatbots and virtual assistants requiring sophisticated and human-like dialogue generation.
  • Creative Writing & Roleplay: Well-suited for applications demanding high-quality prose, storytelling, and character-driven interactions.
  • Prose Generation: Any task where generating text with a refined and nuanced style is critical.
  • Resource-Efficient Deployment: The FP8 quantization makes it a strong candidate for deployment scenarios where memory and computational efficiency are important for a 72B model.