g4me/QwenRolina3-Base-LR1e5-b32g2gc8-wsd-order-domain
g4me/QwenRolina3-Base-LR1e5-b32g2gc8-wsd-order-domain is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. Developed by g4me, this model is trained using SFT with TRL and supports a context length of 32768 tokens. It is designed for general text generation tasks, leveraging its base Qwen3 architecture for broad applicability.
Loading preview...
Overview
This model, g4me/QwenRolina3-Base-LR1e5-b32g2gc8-wsd-order-domain, is a specialized fine-tuned version of the Qwen/Qwen3-1.7B-Base architecture. It has been trained using the TRL (Transformers Reinforcement Learning) library, indicating a focus on instruction following or specific task optimization through supervised fine-tuning (SFT). With approximately 2 billion parameters and a substantial context window of 32768 tokens, it is equipped to handle complex and lengthy text inputs.
Key Capabilities
- Base Model: Leverages the robust capabilities of the Qwen3-1.7B-Base model.
- Fine-tuned Performance: Optimized through SFT for potentially improved performance on specific domains or instruction-following tasks.
- Extended Context: Supports a 32768-token context length, enabling processing of longer documents and conversations.
Good For
- General text generation tasks where the base Qwen3 model is suitable.
- Applications requiring processing of longer input sequences due to its extended context window.
- Further experimentation or fine-tuning on specific datasets, building upon its SFT training.