wAI-org/swerl-qwen3-8b-tmax-15k-grpo
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 20, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The wAI-org/swerl-qwen3-8b-tmax-15k-grpo is an 8 billion parameter language model based on the Qwen3 architecture. This specific checkpoint, derived from the `hamishivi/swerl-tmax-15k` GRPO run, is intended for internal evaluation and continuation experiments. It serves as a foundational model for further research and development rather than direct end-user application.
Loading preview...
Model Overview
The wAI-org/swerl-qwen3-8b-tmax-15k-grpo is an 8 billion parameter language model built upon the Qwen3 architecture. This particular version represents a specific checkpoint (step 500) from the hamishivi/swerl-tmax-15k GRPO (Generative Reinforcement Learning with Policy Optimization) run.
Key Characteristics
- Base Model: It is built on
hamishivi/sft_qwen3_8b_our_sft, indicating a foundation in a supervised fine-tuned Qwen3 8B model. - Training Origin: This checkpoint is a result of a GRPO run, suggesting it has undergone reinforcement learning-based optimization.
- Context Length: The model supports a context length of 32,768 tokens, allowing for processing of substantial input sequences.
Intended Use
This model checkpoint is primarily designated for:
- Internal Evaluation: Assessing performance and characteristics within a research context.
- Continuation Experiments: Serving as a starting point for further training, fine-tuning, or experimental development.