Delta-Vector/Rei-V2-12B

Warm
Public
12B
FP8
32768
Hugging Face
Overview

Rei-V2-12B: Claude 3-like Prose Quality

Rei-V2-12B is a 12 billion parameter language model developed by Delta-Vector, building upon the Mistral-Nemo-Instruct (ChatML'ified) base. Originally an experiment in gradient clipping, its exceptional performance led to its official release. The model was fine-tuned using a prototype Magnum V5 datamix with the explicit goal of replicating the sophisticated prose quality found in Claude 3 models, particularly Sonnet and Opus.

Key Characteristics

  • Base Model: Fine-tuned from Mistral-Nemo-Instruct.
  • Prose Quality: Optimized for generating high-quality, nuanced text, aiming for a style similar to Claude 3 models.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Training Innovation: Experimented with gradient clipping (max_grad_norm) during training, with a 0.001 clip value identified as optimal to prevent overfitting or underfitting.
  • Prompt Format: Utilizes the ChatML format for structured conversations.

Ideal Use Cases

  • Creative Writing: Generating detailed narratives, stories, and descriptive text.
  • Roleplay: Engaging in character-driven interactions and maintaining consistent personas.
  • Prose Generation: Applications requiring sophisticated and high-quality textual output.

The model was trained for 2 epochs on 8x NVIDIA H200 GPUs, leveraging the Axolotl framework.