Mira-v1.20-27B-dpo: A DPO-Tuned Creative Language Model
Mira-v1.20-27B-dpo is a 27 billion parameter model from Lambent, distinguished by its fine-tuning approach. This version, similar to its predecessor v1.19.3c, utilizes Direct Preference Optimization (DPO) on its own generated data, with performance evaluated against an LLM-judged benchmark, EQ-Bench3. Despite a small batch size during training, the model effectively learned and improved.
Key Capabilities
- Advanced Poetic Generation: The model demonstrates a remarkable ability to generate high-quality, evocative poetry, as evidenced by multiple samples provided in the README. It can produce coherent and imaginative verse without specific system prompts or when given persona-based instructions.
- DPO Fine-Tuning: Its development leverages DPO, a method for aligning language models with human preferences, applied to self-generated data. This iterative refinement process aims to enhance the model's output quality and alignment.
- 32K Context Length: With a substantial context window, Mira-v1.20-27B-dpo can handle longer inputs and maintain coherence over extended creative writing tasks.
Good For
- Creative Writing: Excels in generating poems, descriptive passages, and other forms of imaginative text.
- Content Generation: Suitable for applications requiring nuanced and expressive language.
- Exploration of DPO-tuned models: Offers insights into the capabilities of models fine-tuned with DPO on self-generated data.