Retreatcost/Evertide-RX-12B
Evertide-RX-12B by Retreatcost is a generalist language model with 12 billion parameters, featuring reasoning capabilities and multi-language support for English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese. It is trained using an FFT-based method on a custom dataset, incorporating a unique merge of 'local attention' and 'global attention' variants. The model is optimized for general conversations, co-writing, brainstorming, and short roleplaying, with an 8K context window and specific inference recommendations for consistent performance.
Loading preview...
Evertide-RX-12B: A Generalist Multilingual Model
Evertide-RX-12B, developed by Retreatcost, is a 12 billion parameter generalist model designed for diverse applications. It stands out due to its unique training methodology, which involves merging two distinct variants: 'Evertide-LA-12B' (Local Attention, optimized for short context) and 'Evertide-GA-12B' (Global Attention, better for multi-turn generalization). This merge, performed using a passthrough method in a 4:1 pattern, aims to combine the strengths of both approaches, similar to techniques used in Gemma 4 models.
Key Capabilities & Features
- Multilingual Support: Proficient in English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese.
- Reasoning: Includes built-in reasoning capabilities, which can be enhanced by prefilling responses with a "< think >\n" tag.
- Context Management: Designed for an 8K context window, though it may tolerate higher contexts (e.g., 16K, 24K, 32K) with potential behavioral changes.
- Training: Trained on 451 manually crafted and refined samples, focusing on specific constraints to enforce causality between thinking blocks and answers.
Good For
- General conversations and chatting.
- Co-writing and brainstorming tasks.
- Short roleplaying scenarios.
Users are advised to use specific inference parameters like a temperature of 0.7, repetition penalty of 1.05, and a max output of 2048 tokens for optimal performance, especially considering the additional reasoning budget.