Name: wAI-org/swerl-qwen3-8b-openthoughts-grpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wAI-org

SWERL Qwen3 8B Openthoughts GRPO Overview

This model, developed by wAI-org, is the final checkpoint from a Generative Reinforcement Learning with Policy Optimization (GRPO) run, specifically targeting "agent-task openthoughts." It is an 8 billion parameter model based on the Qwen3 architecture, building upon the hamishivi/sft_qwen3_8b_our_sft base model.

Key Characteristics

Base Model: Derived from hamishivi/sft_qwen3_8b_our_sft.
Training Method: Result of a GRPO run, indicating a focus on optimizing policy through reinforcement learning.
Specific Focus: "Agent-task openthoughts" suggests an emphasis on generating internal reasoning steps or thought processes for AI agents.

Intended Use

This checkpoint is primarily intended for:

Internal Evaluation: Assessing its performance and capabilities within the development team.
Continuation Experiments: Serving as a foundation for further research and fine-tuning, particularly in agentic AI applications.

Overview

SWERL Qwen3 8B Openthoughts GRPO Overview

Key Characteristics

Intended Use

Full Model Card (README)