Model Overview
allura-org/TQ2.5-14B-Sugarquill-v1 is a 14 billion parameter model developed by Auri, built upon the SuperNova-Medius base. This model has undergone continued pretraining on a diverse dataset of short stories, aiming to enhance its prose generation capabilities and narrative versatility. It features a substantial 32768 token context length, enabling it to handle longer creative writing tasks beyond typical short stories.
Key Capabilities
- Advanced Story Generation: Fine-tuned on assorted short story data, it produces nuanced and diversified prose.
- Extended Context Handling: Supports a 32768 token context, ideal for generating longer narratives and maintaining coherence over extended interactions.
- Dual-Mode Functionality: Effective for both roleplay (RP) and general storywriting, functioning well in chat-based co-writing and raw completion scenarios.
- Robust Instruction Following: Despite its creative focus, the model retains strong instruction adherence, making it adaptable to specific user prompts.
Training Details
The model was trained for 2 epochs on 10,000 rows (approximately 18.7 million tokens) from the Erebus-87k and r_shortstories_24k datasets. Training involved normalizing punctuation to ASCII and standardizing whitespaces to improve text quality. It utilized rsLoRA with an effective batch size of 40 and a paged_ademamix_8bit optimizer, running on a 5x3090Ti workstation.
Recommended Usage
Users should employ ChatML instruct formatting, consistent with its base model. Recommended sampling parameters include a Temperature of 0.8, Min-P of 0.05, Top-A of 0.3, and a Repetition Penalty of 1.03 for stable and creative outputs.