Lambent/Mira-v1.23.1-27B-dpo Overview
Lambent/Mira-v1.23.1-27B-dpo is a 27 billion parameter language model from Lambent, specifically optimized for creative writing and iterative text generation. This model is a merge of several DPO-tuned adapters, built upon the Lambent/Mira-v1.23-27B-rlvr base model, and utilizes a Karcher Mean merge method to combine their strengths.
Key Capabilities
- Enhanced Creative Writing: Fine-tuned with Direct Preference Optimization (DPO) on approximately 500 rows of creative writing data, including content synthesized by
virtuous7373/Lambent-Mira-Erato. - Iterative Refinement: The DPO training incorporated feedback from a judge, allowing the model to rewrite and improve its generated text iteratively, particularly in creative domains like music composition.
- Identity Reinforcement: Includes specific training for identity reinforcement, aiming to cultivate desired values without memorizing new factual information about its identity.
- Robust Merging: Created using
mergekit with the Karcher Mean method, combining five distinct DPO-tuned adapters to achieve a broad and reconciled performance landscape.
Good For
- Creative Content Generation: Ideal for tasks requiring imaginative and high-quality text, such as storytelling, poetry, or other forms of creative expression.
- Iterative Writing Assistance: Useful in scenarios where text needs to be refined and improved based on feedback, mimicking a collaborative writing process.
- Applications Requiring Specific Persona/Value Alignment: Can be leveraged in contexts where a consistent and reinforced model identity or value set is beneficial.