Name: anthracite-org/magnum-v2.5-12b-kto API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: anthracite-org

Magnum v2.5-12b-kto Overview

Magnum v2.5-12b-kto is an experimental 12 billion parameter language model from Anthracite, building upon the anthracite-org/magnum-12b-v2 base. Its primary goal is to emulate the prose quality of Claude 3 Sonnet and Opus models.

Key Capabilities & Features

Hybrid Reinforcement Learning: Employs a novel KTO + DPOP strategy, using rejected data sampled from the original model and chosen data from the finetuning dataset to improve instruction following.
Prose Quality Focus: Specifically fine-tuned to replicate the sophisticated writing style found in Claude 3 models.
Instruction Following: Enhanced through the experimental reinforcement learning approach, particularly on instruction-following data.
Context Length: Supports a substantial context window of 32768 tokens.
ChatML Formatting: Designed for instruction-tuned interactions using the ChatML prompt format.

Training & Datasets

The model's finetuning leveraged a curated selection of datasets, including:

Filtered Stheno dataset
kalomaze/Opus_Instruct_25k
Nopm/Opus_WritingStruct
A subset of Gryphe/Sonnet3.5-SlimOrcaDedupCleaned (~16k rows)
kalomaze/Opus_Instruct_3k

Good For

This model is suitable for developers and researchers interested in exploring advanced instruction-following capabilities and generating text with a high prose quality, particularly for tasks requiring a style similar to Claude 3 models.