Model Overview
Heralax/llama-gRPo-emotions-nothoughts is an experimental 7 billion parameter model, based on a slightly-continually-trained Mistral 7b v0.2, developed by Heralax. It was fine-tuned using Augmentoolkit's GRPO (Generative Reinforcement Learning for Policy Optimization) pipeline, specifically to enhance emotional expression in its outputs. Unlike its counterpart, llama-gRPo-thoughtprocess, this model was trained on a base without chain-of-thought traces and does not enforce a thought process, leading to a "show, don't tell" approach to emotions.
Key Capabilities
- Emotional Roleplay: Designed to generate responses with intense and varied emotions, ranging from positive (friendship, happiness) to negative (hate, obsession).
- Creative Writing Style: Emphasizes dynamic, human-like, and informal internet forum-style writing, including intentional "flaws" like humming, creative swears, and kaomoji.
- Contextual Engagement: Encouraged to reference past dialogue, use metaphor, and provide callbacks to prior exchanges for engaging conversations.
- Flexible Character Generation: Can invent interesting characters if none are provided in the prompt.
Use Cases
This model is particularly well-suited for:
- Roleplaying scenarios where emotional depth and creative expression are paramount.
- Generating highly engaging and dynamic conversational AI that mimics human-like emotional responses.
- Experimental applications focusing on the stylistic and emotional aspects of language generation rather than strict logical coherence.
It's important to note that while it excels in writing style and emotional output, its logical coherence may be limited due to its base model and training objectives. Users are encouraged to use the provided hardcoded system prompt prefix for optimal performance and to experiment with sampling parameters to avoid repetition.