Overview
Magnum v2.5-12b-kto Overview
Magnum v2.5-12b-kto is an experimental 12 billion parameter language model from Anthracite, building upon the anthracite-org/magnum-12b-v2 base. Its primary goal is to emulate the prose quality of Claude 3 Sonnet and Opus models.
Key Capabilities & Features
- Hybrid Reinforcement Learning: Employs a novel KTO + DPOP strategy, using rejected data sampled from the original model and chosen data from the finetuning dataset to improve instruction following.
- Prose Quality Focus: Specifically fine-tuned to replicate the sophisticated writing style found in Claude 3 models.
- Instruction Following: Enhanced through the experimental reinforcement learning approach, particularly on instruction-following data.
- Context Length: Supports a substantial context window of 32768 tokens.
- ChatML Formatting: Designed for instruction-tuned interactions using the ChatML prompt format.
Training & Datasets
The model's finetuning leveraged a curated selection of datasets, including:
- Filtered Stheno dataset
kalomaze/Opus_Instruct_25kNopm/Opus_WritingStruct- A subset of
Gryphe/Sonnet3.5-SlimOrcaDedupCleaned(~16k rows) kalomaze/Opus_Instruct_3k
Good For
This model is suitable for developers and researchers interested in exploring advanced instruction-following capabilities and generating text with a high prose quality, particularly for tasks requiring a style similar to Claude 3 models.