anthracite-org/magnum-v2.5-12b-kto

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Aug 12, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

anthracite-org/magnum-v2.5-12b-kto is a 12 billion parameter experimental language model developed by Anthracite, fine-tuned on magnum-12b-v2. It utilizes a hybrid KTO + DPOP reinforcement learning strategy to enhance instruction following, aiming to replicate the prose quality of Claude 3 models. This model is optimized for generating high-quality, instruction-tuned text with a 32768 token context length.

Loading preview...

Magnum v2.5-12b-kto Overview

Magnum v2.5-12b-kto is an experimental 12 billion parameter language model from Anthracite, building upon the anthracite-org/magnum-12b-v2 base. Its primary goal is to emulate the prose quality of Claude 3 Sonnet and Opus models.

Key Capabilities & Features

  • Hybrid Reinforcement Learning: Employs a novel KTO + DPOP strategy, using rejected data sampled from the original model and chosen data from the finetuning dataset to improve instruction following.
  • Prose Quality Focus: Specifically fine-tuned to replicate the sophisticated writing style found in Claude 3 models.
  • Instruction Following: Enhanced through the experimental reinforcement learning approach, particularly on instruction-following data.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • ChatML Formatting: Designed for instruction-tuned interactions using the ChatML prompt format.

Training & Datasets

The model's finetuning leveraged a curated selection of datasets, including:

  • Filtered Stheno dataset
  • kalomaze/Opus_Instruct_25k
  • Nopm/Opus_WritingStruct
  • A subset of Gryphe/Sonnet3.5-SlimOrcaDedupCleaned (~16k rows)
  • kalomaze/Opus_Instruct_3k

Good For

This model is suitable for developers and researchers interested in exploring advanced instruction-following capabilities and generating text with a high prose quality, particularly for tasks requiring a style similar to Claude 3 models.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p