allura-org/remnant-qwen3-8b
The allura-org/remnant-qwen3-8b is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B-Base, developed by Allura. This model is specifically optimized for SFW and NSFW roleplaying and conversational tasks. It utilizes a sequence length of 8192 tokens and is trained using the Axolotl framework, making it suitable for engaging and extended interactive text generation.
Loading preview...
Remnant Qwen3 8b: Roleplaying and Conversation Model
The allura-org/remnant-qwen3-8b is a specialized large language model (LLM) from the Remnant series, developed by Allura. It is fine-tuned from the Qwen/Qwen3-8B-Base architecture, focusing on enhancing capabilities for interactive and engaging text generation.
Key Capabilities & Features
- Roleplaying and Conversation: Primarily designed for both SFW (Safe For Work) and NSFW (Not Safe For Work) roleplaying scenarios and general conversational tasks.
- Base Model: Built upon the robust Qwen3-8B-Base model, providing a strong foundation for language understanding and generation.
- Training Framework: Leverages the Axolotl training framework, indicating a structured and efficient fine-tuning process.
- Context Length: Supports a substantial sequence length of 8192 tokens, allowing for extended and coherent dialogues.
- Chat Template: Recommended to be used with the ChatML chat template, with Llama 3 format also noted as potentially compatible.
Recommended Use Cases
- Interactive Storytelling: Ideal for applications requiring dynamic and character-driven narratives.
- Chatbots: Suitable for creating engaging conversational agents, particularly those designed for roleplay or extended dialogue.
- Creative Writing: Can be used as a tool for generating creative text in a conversational style.
Technical Details
The model was fine-tuned for 2 epochs with a micro batch size of 32 and a learning rate of 2e-5, utilizing an Apollo-mini optimizer. It incorporates optimizations like gradient checkpointing with Unsloth and Flash Attention for efficient processing.