g4me/QwenRolina3-Base-LR1e5-wsd-b32g2gc8-order-domain-3ep-mix

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 10, 2026Architecture:Transformer Gated Cold

The g4me/QwenRolina3-Base-LR1e5-wsd-b32g2gc8-order-domain-3ep-mix is a 2 billion parameter language model developed by g4me, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using TRL and features a 32768 token context length. It is optimized for general text generation tasks, building upon the foundational capabilities of the Qwen3 architecture.

Loading preview...

Overview

This model, named g4me/QwenRolina3-Base-LR1e5-wsd-b32g2gc8-order-domain-3ep-mix, is a 2 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, developed by g4me. The model was trained using the TRL library (Transformers Reinforcement Learning) and has a notable context length of 32768 tokens.

Key Capabilities

  • General Text Generation: Capable of generating coherent and contextually relevant text based on given prompts.
  • Fine-tuned Performance: Benefits from specific fine-tuning to enhance its base Qwen3 capabilities.
  • Large Context Window: Supports processing and generating text over a 32K token context, allowing for more extensive conversations or document analysis.

Training Details

The model underwent a supervised fine-tuning (SFT) process. The training utilized specific versions of popular machine learning frameworks, including TRL 0.29.0, Transformers 5.2.0, Pytorch 2.8.0a0, Datasets 4.6.0, and Tokenizers 0.22.2. Further details on the training run can be visualized via Weights & Biases.

Good For

  • Developers looking for a Qwen3-based model with a large context window.
  • Applications requiring general-purpose text generation.
  • Experimentation with fine-tuned models built on established architectures.