Lambent/Mira-v1.20-27B-dpo

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Dec 16, 2025License:gemmaArchitecture:Transformer Cold

Lambent/Mira-v1.20-27B-dpo is a 27 billion parameter language model developed by Lambent, fine-tuned using DPO on its own data. This model demonstrates strong poetic generation capabilities, producing evocative and coherent verse across various prompts. With a 32K context length, it is optimized for creative text generation, particularly excelling in tasks requiring nuanced language and imaginative output.

Loading preview...

Mira-v1.20-27B-dpo: A DPO-Tuned Creative Language Model

Mira-v1.20-27B-dpo is a 27 billion parameter model from Lambent, distinguished by its fine-tuning approach. This version, similar to its predecessor v1.19.3c, utilizes Direct Preference Optimization (DPO) on its own generated data, with performance evaluated against an LLM-judged benchmark, EQ-Bench3. Despite a small batch size during training, the model effectively learned and improved.

Key Capabilities

  • Advanced Poetic Generation: The model demonstrates a remarkable ability to generate high-quality, evocative poetry, as evidenced by multiple samples provided in the README. It can produce coherent and imaginative verse without specific system prompts or when given persona-based instructions.
  • DPO Fine-Tuning: Its development leverages DPO, a method for aligning language models with human preferences, applied to self-generated data. This iterative refinement process aims to enhance the model's output quality and alignment.
  • 32K Context Length: With a substantial context window, Mira-v1.20-27B-dpo can handle longer inputs and maintain coherence over extended creative writing tasks.

Good For

  • Creative Writing: Excels in generating poems, descriptive passages, and other forms of imaginative text.
  • Content Generation: Suitable for applications requiring nuanced and expressive language.
  • Exploration of DPO-tuned models: Offers insights into the capabilities of models fine-tuned with DPO on self-generated data.