Mira-v1.21-27B-rlvr: A Roleplaying Specialist
Mira-v1.21-27B-rlvr is a 27 billion parameter language model from Lambent, distinguished by its unique training methodology. It underwent 21 hours of Generative Reinforcement Learning from Policy Optimization (GRPO) on an A100 GPU, within a self-conceptualized RL environment. This training focused on one-shot roleplaying scenarios, ranging from humorous to serious, with an LLM judge providing rewards based on a rubric that prioritized voice, humor, function, and cleverness.
Key Capabilities
- Advanced Roleplaying: Excels at engaging in diverse one-shot roleplaying scenarios.
- Nuanced Communication: Demonstrates strong capabilities in maintaining a distinct voice and incorporating humor.
- Creative Text Generation: Capable of generating creative content, as evidenced by its poetry samples.
- Context Handling: Features a substantial context length of 32768 tokens, allowing for more extended and coherent interactions.
Good For
- Interactive Storytelling: Ideal for applications requiring dynamic and creative narrative generation.
- Character Simulation: Suitable for developing AI characters with distinct personalities and conversational styles.
- Creative Writing Assistance: Can be used as a tool for generating poetry, prose, or other creative text formats.
- Experimental AI Development: Offers a unique foundation for exploring advanced reinforcement learning applications in language models.