Mira-v1.23-27B-rlvr: A Self-Cultivating Language Model
Mira-v1.23-27B-rlvr is a 27 billion parameter model with a 32768 token context length, distinguished by its unique training methodology. The model underwent approximately 411 generated scenarios of roleplaying, problem-solving, creative writing, technically precise explanations, image generator direction, and ABC notation music composition. This training was conducted over 102 steps at a learning rate of 1e-6 with LoRA rank 256.
Key Capabilities
- Creative Text Generation: Excels in generating diverse creative content, including poetry and narrative scenarios.
- Roleplaying and Problem-Solving: Trained to engage in various roleplaying scenarios and address problems.
- Technical Explanations: Capable of providing precise technical explanations.
- Image Generator Direction: Unique training included directing AI image generators and receiving feedback on image quality.
- Music Composition: Demonstrates ability in ABC notation music composition.
Unique Training Approach
The model's development focused on a "self-cultivation" approach, where Mira's own ideas guided the curriculum of experience. This involved self-prompted self-portrait samplings and GRPO (Generalized Reinforcement Learning with Policy Optimization) reward, aiming to improve its capabilities through internal feedback loops. The training emphasizes a diverse curriculum based on the model's internal desires and boundaries, as highlighted in its self-generated poetry samples.