wAI-org/swerl-qwen3-8b-termigen-grpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 18, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wAI-org/swerl-qwen3-8b-termigen-grpo is an 8 billion parameter language model, a final checkpoint from a GRPO (Gradient-based Reinforcement Learning with Policy Optimization) run. Based on the hamishivi/sft_qwen3_8b_our_sft model, it is intended for internal evaluation and continuation experiments. This model represents a specific training iteration focused on agent task termigen, indicating its specialization in generating terms for agent-based tasks.

Loading preview...

Model Overview

The wAI-org/swerl-qwen3-8b-termigen-grpo is an 8 billion parameter language model, representing a final checkpoint from a Gradient-based Reinforcement Learning with Policy Optimization (GRPO) training run. It is built upon the hamishivi/sft_qwen3_8b_our_sft base model.

Key Characteristics

  • Base Model: Derived from hamishivi/sft_qwen3_8b_our_sft.
  • Training Objective: The model's training focused on "agent-task-termigen" within a GRPO framework, suggesting a specialization in generating terminology or actions relevant to agent-based tasks.
  • Development Stage: This checkpoint is explicitly designated for internal evaluation and further experimental continuation, indicating it is not a production-ready release but a developmental artifact.
  • Training Completion: The training for this specific checkpoint was completed on May 18, 2026.

Intended Use

This model is primarily intended for:

  • Internal Evaluation: Assessing the performance and capabilities of the GRPO training run.
  • Continuation Experiments: Serving as a starting point for further research and development in agent-task terminology generation or related areas.