Name: Lambent/Mira-v1.23.1-27B-dpo API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Lambent

Lambent/Mira-v1.23.1-27B-dpo Overview

Lambent/Mira-v1.23.1-27B-dpo is a 27 billion parameter language model from Lambent, specifically optimized for creative writing and iterative text generation. This model is a merge of several DPO-tuned adapters, built upon the Lambent/Mira-v1.23-27B-rlvr base model, and utilizes a Karcher Mean merge method to combine their strengths.

Key Capabilities

Enhanced Creative Writing: Fine-tuned with Direct Preference Optimization (DPO) on approximately 500 rows of creative writing data, including content synthesized by virtuous7373/Lambent-Mira-Erato.
Iterative Refinement: The DPO training incorporated feedback from a judge, allowing the model to rewrite and improve its generated text iteratively, particularly in creative domains like music composition.
Identity Reinforcement: Includes specific training for identity reinforcement, aiming to cultivate desired values without memorizing new factual information about its identity.
Robust Merging: Created using mergekit with the Karcher Mean method, combining five distinct DPO-tuned adapters to achieve a broad and reconciled performance landscape.

Good For

Creative Content Generation: Ideal for tasks requiring imaginative and high-quality text, such as storytelling, poetry, or other forms of creative expression.
Iterative Writing Assistance: Useful in scenarios where text needs to be refined and improved based on feedback, mimicking a collaborative writing process.
Applications Requiring Specific Persona/Value Alignment: Can be leveraged in contexts where a consistent and reinforced model identity or value set is beneficial.

Overview

Lambent/Mira-v1.23.1-27B-dpo Overview

Key Capabilities

Good For

Full Model Card (README)