Lambent/Mira-v1.25.2-27B-DPO

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Feb 19, 2026License:gemmaArchitecture:Transformer Cold

Lambent/Mira-v1.25.2-27B-DPO is a 27 billion parameter language model developed by Lambent, built upon the Mira-v1.25.1-27B-DPO base. This model utilizes a second DPO (Direct Preference Optimization) phase, specifically targeting synthetic negatives related to its value drift patterns in multi-turn self-examination. It maintains creative capabilities and intelligence similar to its predecessor, making it suitable for tasks requiring nuanced understanding and generation.

Loading preview...

Overview

Lambent/Mira-v1.25.2-27B-DPO is a 27 billion parameter language model, representing an iteration built upon the Mira-v1.25.1-27B-DPO base. It was created using a Karcher Mean merge method, combining the base model with several DPO-adapted versions.

Key Characteristics

  • Second DPO Phase: This version incorporates a second Direct Preference Optimization (DPO) phase. This DPO specifically targets synthetic negative examples that reflect the model's value drift patterns during multi-turn self-examination, aiming to refine its behavior in complex interactions.
  • Creative Capabilities: The model is noted to retain creative capabilities and intelligence comparable to its immediate predecessor, Mira-v1.25.1-27B-DPO.
  • Merge Method: The model is a merge of pre-trained language models, utilizing the Karcher Mean method for integration.

Good For

  • Applications requiring nuanced language generation and understanding.
  • Use cases where refined multi-turn interaction and self-correction are beneficial.
  • Tasks that leverage the creative and intelligent capabilities established in the Mira-v1.25.1-27B-DPO series.