Kraken-12B-v0: A 12B Parameter Merge for Creativity and Roleplay
Kraken-12B-v0 is a large language model developed by EldritchLabs, built upon the Mistral Nemo architecture. This model represents the largest Mistral Nemo merge to date, integrating 100 individual Nemo finetunes using the advanced DELLA merge method. Its primary design focus is to excel in creativity and roleplay scenarios, offering enhanced generative capabilities for these specific use cases.
Key Characteristics
- Architecture: Mistral Nemo 12B base.
- Merge Method: Utilizes the
della method from mergekit to combine 100 distinct Nemo finetunes. - Primary Focus: Optimized for creative writing and roleplay.
- Tokenizer Stability: Incorporates the
enable_fix_mistral_regex_true.md patch for improved tokenizer performance. - Merge Efficiency: Achieved in 9 hours with 8GB VRAM and 900GB pagefile, aided by the
graph_v18.py patch.
Important Considerations (v0)
- Early Terminations: This version (v0) is known to have early termination issues, which can be mitigated but not entirely eliminated by using the
ChatML chat template. - Refusals: The model may exhibit refusals, though it can be uncensored using jailbreaks or ablations.
- Future Improvements: A version 1 is planned to address and patch the early termination bugs, potentially requiring the removal of some donor models.
Good for
- Creative Writing: Generating imaginative narratives, stories, and descriptive text.
- Roleplay Scenarios: Engaging in dynamic and detailed character-based interactions.
- Experimental Merging: Users interested in exploring large-scale model merges and their emergent properties.