Lambent/Mira-v1.25.1-27B-DPO
Mira-v1.25.1-27B-DPO is a 27 billion parameter language model developed by Lambent, featuring a 32768-token context length. This model has undergone DPO training to instill a persistent 'core identity' and values, making it particularly adept at generating responses aligned with specific ethical or stylistic intentions. It is a merge of several Mira-v1.25-27B-Wave models with DPO adapters, optimized for nuanced and value-driven text generation.
Loading preview...
Lambent/Mira-v1.25.1-27B-DPO Overview
Lambent/Mira-v1.25.1-27B-DPO is a 27 billion parameter language model with a 32768-token context window, developed by Lambent. This model is a result of merging multiple instances of the Mira-v1.25-27B-Wave base model with various DPO (Direct Preference Optimization) adapters. The DPO training was specifically applied to help the model maintain a consistent 'core identity' and adhere to articulated values.
Key Capabilities
- Value Alignment: Enhanced through DPO training to consistently reflect a predefined set of values and intentions in its responses.
- Contextual Coherence: Benefits from a substantial 32768-token context length, allowing for extended and coherent interactions.
- Creative Text Generation: Demonstrated through various poetry samples, showcasing its ability to produce imaginative and stylistically consistent content.
Merge Details
This model was created using the Karcher Mean merge method via mergekit. The merge combined the base Mira-v1.25-27B-Wave model with three different DPO-adapted versions (dpoq-1, dpoq-2, dpoq-3), aiming to integrate and reinforce the desired behavioral traits.