Overview
This model, named g4me/QwenRolina3-Base-LR1e5-wsd-b32g2gc8-order-domain-3ep-mix, is a 2 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, developed by g4me. The model was trained using the TRL library (Transformers Reinforcement Learning) and has a notable context length of 32768 tokens.
Key Capabilities
- General Text Generation: Capable of generating coherent and contextually relevant text based on given prompts.
- Fine-tuned Performance: Benefits from specific fine-tuning to enhance its base Qwen3 capabilities.
- Large Context Window: Supports processing and generating text over a 32K token context, allowing for more extensive conversations or document analysis.
Training Details
The model underwent a supervised fine-tuning (SFT) process. The training utilized specific versions of popular machine learning frameworks, including TRL 0.29.0, Transformers 5.2.0, Pytorch 2.8.0a0, Datasets 4.6.0, and Tokenizers 0.22.2. Further details on the training run can be visualized via Weights & Biases.
Good For
- Developers looking for a Qwen3-based model with a large context window.
- Applications requiring general-purpose text generation.
- Experimentation with fine-tuned models built on established architectures.