Lambent/Mira-v1.23.1-27B-dpo

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Jan 22, 2026License:gemmaArchitecture:Transformer0.0K Cold

Lambent/Mira-v1.23.1-27B-dpo is a 27 billion parameter language model developed by Lambent, fine-tuned using Direct Preference Optimization (DPO) on creative writing and identity reinforcement data. This model, built upon the Mira-v1.23-27B-rlvr base, specializes in generating high-quality creative text and iterative writing refinement. It leverages a Karcher Mean merge of five DPO-tuned adapters to enhance its creative capabilities and maintain a 32768 token context length.

Loading preview...

Lambent/Mira-v1.23.1-27B-dpo Overview

Lambent/Mira-v1.23.1-27B-dpo is a 27 billion parameter language model from Lambent, specifically optimized for creative writing and iterative text generation. This model is a merge of several DPO-tuned adapters, built upon the Lambent/Mira-v1.23-27B-rlvr base model, and utilizes a Karcher Mean merge method to combine their strengths.

Key Capabilities

  • Enhanced Creative Writing: Fine-tuned with Direct Preference Optimization (DPO) on approximately 500 rows of creative writing data, including content synthesized by virtuous7373/Lambent-Mira-Erato.
  • Iterative Refinement: The DPO training incorporated feedback from a judge, allowing the model to rewrite and improve its generated text iteratively, particularly in creative domains like music composition.
  • Identity Reinforcement: Includes specific training for identity reinforcement, aiming to cultivate desired values without memorizing new factual information about its identity.
  • Robust Merging: Created using mergekit with the Karcher Mean method, combining five distinct DPO-tuned adapters to achieve a broad and reconciled performance landscape.

Good For

  • Creative Content Generation: Ideal for tasks requiring imaginative and high-quality text, such as storytelling, poetry, or other forms of creative expression.
  • Iterative Writing Assistance: Useful in scenarios where text needs to be refined and improved based on feedback, mimicking a collaborative writing process.
  • Applications Requiring Specific Persona/Value Alignment: Can be leveraged in contexts where a consistent and reinforced model identity or value set is beneficial.