Lambent/Mira-v1.25.2-27B-DPO
VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kPublished:Feb 19, 2026License:gemmaArchitecture:Transformer Cold
Lambent/Mira-v1.25.2-27B-DPO is a 27 billion parameter language model developed by Lambent, built upon the Mira-v1.25.1-27B-DPO base. This model utilizes a second DPO (Direct Preference Optimization) phase, specifically targeting synthetic negatives related to its value drift patterns in multi-turn self-examination. It maintains creative capabilities and intelligence similar to its predecessor, making it suitable for tasks requiring nuanced understanding and generation.
Loading preview...
Overview
Lambent/Mira-v1.25.2-27B-DPO is a 27 billion parameter language model, representing an iteration built upon the Mira-v1.25.1-27B-DPO base. It was created using a Karcher Mean merge method, combining the base model with several DPO-adapted versions.
Key Characteristics
- Second DPO Phase: This version incorporates a second Direct Preference Optimization (DPO) phase. This DPO specifically targets synthetic negative examples that reflect the model's value drift patterns during multi-turn self-examination, aiming to refine its behavior in complex interactions.
- Creative Capabilities: The model is noted to retain creative capabilities and intelligence comparable to its immediate predecessor, Mira-v1.25.1-27B-DPO.
- Merge Method: The model is a merge of pre-trained language models, utilizing the Karcher Mean method for integration.
Good For
- Applications requiring nuanced language generation and understanding.
- Use cases where refined multi-turn interaction and self-correction are beneficial.
- Tasks that leverage the creative and intelligent capabilities established in the Mira-v1.25.1-27B-DPO series.