wAI-org/swerl-qwen3-8b-tmax-15k-grpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 20, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wAI-org/swerl-qwen3-8b-tmax-15k-grpo is an 8 billion parameter language model based on the Qwen3 architecture. This specific checkpoint, derived from the `hamishivi/swerl-tmax-15k` GRPO run, is intended for internal evaluation and continuation experiments. It serves as a foundational model for further research and development rather than direct end-user application.

Loading preview...

Model Overview

The wAI-org/swerl-qwen3-8b-tmax-15k-grpo is an 8 billion parameter language model built upon the Qwen3 architecture. This particular version represents a specific checkpoint (step 500) from the hamishivi/swerl-tmax-15k GRPO (Generative Reinforcement Learning with Policy Optimization) run.

Key Characteristics

  • Base Model: It is built on hamishivi/sft_qwen3_8b_our_sft, indicating a foundation in a supervised fine-tuned Qwen3 8B model.
  • Training Origin: This checkpoint is a result of a GRPO run, suggesting it has undergone reinforcement learning-based optimization.
  • Context Length: The model supports a context length of 32,768 tokens, allowing for processing of substantial input sequences.

Intended Use

This model checkpoint is primarily designated for:

  • Internal Evaluation: Assessing performance and characteristics within a research context.
  • Continuation Experiments: Serving as a starting point for further training, fine-tuning, or experimental development.