Sao10K/L3-8B-Stheno-v3.3-32K is an 8 billion parameter language model developed by Sao10K, trained with support from Backyard.ai. It is optimized for creative writing and roleplaying, featuring an expanded 32K context length achieved through PoSE training from an initial 8K context. This model excels in generating coherent and engaging narrative content, particularly in roleplay scenarios, despite some limitations in long-context reasoning.
Loading preview...
Sao10K/L3-8B-Stheno-v3.3-32K Overview
This model, developed by Sao10K with compute from Backyard.ai, is an 8 billion parameter language model primarily focused on creative writing and roleplaying. It was initially trained at an 8K context length and subsequently expanded to a 32K context using PoSE (Position-aware Scaling for Extrapolation) training, allowing for longer narrative generation.
Key Training & Data Enhancements
- Refined Roleplaying Samples: Extensive cleanup and quality checks were performed on roleplaying datasets.
- Improved Creative Writing: The model incorporates twice the amount of creative writing samples compared to previous versions.
- Detailed Instruct Data: Instruction-following data was remade and refined to enhance response quality.
- PoSE Training: Utilizes a 2M Rope Theta for effective context expansion, showing better coherence than standard rope scaling for extended contexts.
Capabilities & Considerations
- Strong Roleplay Performance: The model is noted for its ability to generate engaging and coherent roleplay scenarios.
- Expanded Context: While not natively 32K, its PoSE training provides a functional 32K context, though it may exhibit some issues with very long-context understanding and reasoning.
- Training Stability: The training run was less aggressive than prior Stheno versions, contributing to its stability.
Ideal Use Cases
- Creative Story Generation: Excellent for generating narratives, dialogues, and descriptive text.
- Interactive Roleplaying: Suited for applications requiring dynamic and engaging character interactions.
- Content Creation: Can assist in drafting creative content where a longer context window is beneficial for maintaining narrative flow.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.