Name: clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-swift-white-atlas API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: clijo

Model Overview

This model, clijo/qwen3-4b-instruct-2507-bf16-reco-grpo-b200-swift-white-atlas, is a 4 billion parameter instruction-tuned language model. It is a fine-tuned version of the base model Qwen/Qwen3-4B-Instruct-2507 and was developed by clijo.

Key Capabilities & Training

Fine-tuned from Qwen3-4B-Instruct-2507: Leverages the robust architecture of the Qwen3-4B-Instruct series.
GRPO Training Method: The model was trained using the GRPO (Gradient Regularized Policy Optimization) method. This technique, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggests an optimization for tasks requiring advanced reasoning, particularly in mathematical contexts.
TRL Framework: Training was conducted using the TRL (Transformers Reinforcement Learning) framework, indicating a reinforcement learning approach to fine-tuning.

Potential Use Cases

Reasoning-intensive tasks: Due to its GRPO training, it may perform well in scenarios requiring logical deduction and problem-solving.
Instruction following: As an instruction-tuned model, it is designed to respond effectively to user prompts and instructions.
Mathematical problem-solving: The GRPO method's origin in enhancing mathematical reasoning suggests its suitability for tasks involving numerical or logical challenges.

Overview

Model Overview

Key Capabilities & Training

Potential Use Cases

Full Model Card (README)