Model Overview
RefalMachine/RuadaptQwen3-8B-Hybrid is an 8 billion parameter model based on the Qwen/Qwen3-8B architecture, specifically adapted for the Russian language. This model integrates a hybrid reasoning mechanism, which is enabled by default, allowing for complex thought processes. Users can toggle this reasoning mode on or off by appending /think or /no_think tokens to messages, or programmatically via the enable_thinking parameter in the tokenizer's chat template.
Key Adaptations and Features
- Russian Language Optimization: The model underwent continued pre-training on a Russian-language corpus after its tokenizer was replaced.
- Enhanced Tokenizer: An extended tiktoken cl100k tokenizer, augmented with 48,000 Russian tokens, was implemented. This new tokenizer significantly improves the generation speed of Russian texts, achieving up to a 100% increase compared to the original Qwen3-8B model, depending on context length.
- Learned Embedding Propagation (LEP): This technique was applied during the adaptation process to further enhance the model's performance.
- Hybrid Reasoning: The model supports a flexible reasoning mode, which can be controlled by the user.
Recommended Usage
For stable performance, it is recommended to use low temperatures (0.0-0.3), a top_p value between 0.85 and 0.95, and a repetition_penalty of 1.05. These parameters can be adjusted based on specific task requirements, with repetition_penalty potentially lowered to 1.0 for RAG applications or increased if the model exhibits repetitive outputs.
Important Considerations
The model's responses reflect knowledge acquired during its training and do not represent the authors' opinions. It is based on a third-party pre-trained model, and the current authors are not responsible for its initial pre-training. Users should exercise caution and discretion when interpreting outputs.