Lambent/Mira-v1.23.1-27B-dpo
Lambent/Mira-v1.23.1-27B-dpo is a 27 billion parameter language model developed by Lambent, fine-tuned using Direct Preference Optimization (DPO) on creative writing and identity reinforcement data. This model, built upon the Mira-v1.23-27B-rlvr base, specializes in generating high-quality creative text and iterative writing refinement. It leverages a Karcher Mean merge of five DPO-tuned adapters to enhance its creative capabilities and maintain a 32768 token context length.
Loading preview...
Lambent/Mira-v1.23.1-27B-dpo Overview
Lambent/Mira-v1.23.1-27B-dpo is a 27 billion parameter language model from Lambent, specifically optimized for creative writing and iterative text generation. This model is a merge of several DPO-tuned adapters, built upon the Lambent/Mira-v1.23-27B-rlvr base model, and utilizes a Karcher Mean merge method to combine their strengths.
Key Capabilities
- Enhanced Creative Writing: Fine-tuned with Direct Preference Optimization (DPO) on approximately 500 rows of creative writing data, including content synthesized by
virtuous7373/Lambent-Mira-Erato. - Iterative Refinement: The DPO training incorporated feedback from a judge, allowing the model to rewrite and improve its generated text iteratively, particularly in creative domains like music composition.
- Identity Reinforcement: Includes specific training for identity reinforcement, aiming to cultivate desired values without memorizing new factual information about its identity.
- Robust Merging: Created using
mergekitwith the Karcher Mean method, combining five distinct DPO-tuned adapters to achieve a broad and reconciled performance landscape.
Good For
- Creative Content Generation: Ideal for tasks requiring imaginative and high-quality text, such as storytelling, poetry, or other forms of creative expression.
- Iterative Writing Assistance: Useful in scenarios where text needs to be refined and improved based on feedback, mimicking a collaborative writing process.
- Applications Requiring Specific Persona/Value Alignment: Can be leveraged in contexts where a consistent and reinforced model identity or value set is beneficial.