RefalMachine/RuadaptQwen3-8B-Hybrid

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 26, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

RefalMachine/RuadaptQwen3-8B-Hybrid is an 8 billion parameter Qwen3-8B model adapted for Russian language processing, featuring a hybrid reasoning mechanism and a 32K context length. Developed by RefalMachine, it incorporates a replaced tokenizer and continued pre-training on a Russian corpus, alongside the Learned Embedding Propagation (LEP) technique. This adaptation significantly boosts Russian text generation speed by up to 100% compared to the original Qwen3-8B, making it highly efficient for Russian-centric NLP tasks.

Loading preview...

Model Overview

RefalMachine/RuadaptQwen3-8B-Hybrid is an 8 billion parameter model based on the Qwen/Qwen3-8B architecture, specifically adapted for the Russian language. This model integrates a hybrid reasoning mechanism, which is enabled by default, allowing for complex thought processes. Users can toggle this reasoning mode on or off by appending /think or /no_think tokens to messages, or programmatically via the enable_thinking parameter in the tokenizer's chat template.

Key Adaptations and Features

  • Russian Language Optimization: The model underwent continued pre-training on a Russian-language corpus after its tokenizer was replaced.
  • Enhanced Tokenizer: An extended tiktoken cl100k tokenizer, augmented with 48,000 Russian tokens, was implemented. This new tokenizer significantly improves the generation speed of Russian texts, achieving up to a 100% increase compared to the original Qwen3-8B model, depending on context length.
  • Learned Embedding Propagation (LEP): This technique was applied during the adaptation process to further enhance the model's performance.
  • Hybrid Reasoning: The model supports a flexible reasoning mode, which can be controlled by the user.

Recommended Usage

For stable performance, it is recommended to use low temperatures (0.0-0.3), a top_p value between 0.85 and 0.95, and a repetition_penalty of 1.05. These parameters can be adjusted based on specific task requirements, with repetition_penalty potentially lowered to 1.0 for RAG applications or increased if the model exhibits repetitive outputs.

Important Considerations

The model's responses reflect knowledge acquired during its training and do not represent the authors' opinions. It is based on a third-party pre-trained model, and the current authors are not responsible for its initial pre-training. Users should exercise caution and discretion when interpreting outputs.