Name: BillyWang1/qwen2.5-7b-base-retool-slime-sft API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: BillyWang1

Overview

The BillyWang1/qwen2.5-7b-base-retool-slime-sft model is a 7.6 billion parameter variant of the Qwen2.5 base architecture. It has undergone a Supervised Fine-Tuning (SFT) cold-start phase to specifically learn a multi-turn tool-use format. This training enables the model to generate Python code, process interpreter outputs, and conclude with a boxed answer, following the <code> / <interpreter> / \boxed{} structure.

Key Capabilities

Tool-Use Format Learning: Teaches the model to interact with tools using a structured multi-turn format.
Code Generation: Capable of writing Python code within the tool-use context.
Output Interpretation: Designed to read and utilize sandbox output from code execution.
Structured Answering: Formats final answers within a \boxed{} structure.

Training Details

This model was trained using the slime v0.3.0 framework on the JoeYing/ReTool-SFT dataset, which comprises 2,000 multi-turn {messages} trajectories. The training involved 3 epochs with an Adam optimizer and a learning rate of 1e-5, achieving a final train loss of approximately 0.37–0.40.

Good For

Further RL Stages: Intended as the --hf-checkpoint or MODEL_PATH for subsequent GRPO / GFlowRL stages within the retool-slime pipeline.
Tool-Augmented Math Inference: Directly applicable for mathematical problem-solving that requires tool interaction using the specified ReTool <code>/<interpreter> format.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)