anthracite-org/magnum-v2.5-12b-kto

Warm
Public
12B
FP8
32768
1
Aug 12, 2024
License: apache-2.0
Hugging Face
Overview

Magnum v2.5-12b-kto Overview

Magnum v2.5-12b-kto is an experimental 12 billion parameter language model from Anthracite, building upon the anthracite-org/magnum-12b-v2 base. Its primary goal is to emulate the prose quality of Claude 3 Sonnet and Opus models.

Key Capabilities & Features

  • Hybrid Reinforcement Learning: Employs a novel KTO + DPOP strategy, using rejected data sampled from the original model and chosen data from the finetuning dataset to improve instruction following.
  • Prose Quality Focus: Specifically fine-tuned to replicate the sophisticated writing style found in Claude 3 models.
  • Instruction Following: Enhanced through the experimental reinforcement learning approach, particularly on instruction-following data.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • ChatML Formatting: Designed for instruction-tuned interactions using the ChatML prompt format.

Training & Datasets

The model's finetuning leveraged a curated selection of datasets, including:

  • Filtered Stheno dataset
  • kalomaze/Opus_Instruct_25k
  • Nopm/Opus_WritingStruct
  • A subset of Gryphe/Sonnet3.5-SlimOrcaDedupCleaned (~16k rows)
  • kalomaze/Opus_Instruct_3k

Good For

This model is suitable for developers and researchers interested in exploring advanced instruction-following capabilities and generating text with a high prose quality, particularly for tasks requiring a style similar to Claude 3 models.