wAI-org/swerl-qwen3-8b-endless-terminals-grpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 20, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wAI-org/swerl-qwen3-8b-endless-terminals-grpo is an 8 billion parameter model, a checkpoint from a GRPO (Generative Reinforcement Learning with Policy Optimization) run, based on the hamishivi/sft_qwen3_8b_our_sft base model. This model is specifically developed for internal evaluation and continuation experiments, indicating its role in ongoing research and development. It is a specialized iteration focused on agent tasks within an 'endless terminals' environment, suggesting optimization for interactive or command-line based AI agents. Its primary purpose is for further experimental work rather than general-purpose application.

Loading preview...

Model Overview

The wAI-org/swerl-qwen3-8b-endless-terminals-grpo is an 8 billion parameter language model, representing a specific checkpoint (Step 500) from a Generative Reinforcement Learning with Policy Optimization (GRPO) training run. It is built upon the hamishivi/sft_qwen3_8b_our_sft base model.

Key Characteristics

  • Base Model: Derived from hamishivi/sft_qwen3_8b_our_sft.
  • Training Method: Result of a GRPO run, specifically hamishivi/agent-task-endless-terminals.
  • Development Stage: This is a training checkpoint, not a final release model.

Intended Use

  • Internal Evaluation: Primarily designed for internal assessment of its performance and capabilities.
  • Continuation Experiments: Suitable for further research and development, serving as a starting point for new experiments.

This model is a specialized artifact from an ongoing research project, focused on agent tasks within an 'endless terminals' context, and is not intended for broad, general-purpose applications.