BillyWang1/qwen2.5-7b-base-retool-slime-sft

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The BillyWang1/qwen2.5-7b-base-retool-slime-sft model is a 7.6 billion parameter Qwen2.5 base model, fine-tuned for multi-turn tool-use capabilities. It specializes in generating Python code, interpreting sandbox output, and providing boxed answers, making it suitable for tool-augmented mathematical inference. This model serves as the initial SFT cold-start stage for the retool-slime pipeline, teaching the specific `` / `` / `\boxed{}` format.

Loading preview...

Overview

The BillyWang1/qwen2.5-7b-base-retool-slime-sft model is a 7.6 billion parameter variant of the Qwen2.5 base architecture. It has undergone a Supervised Fine-Tuning (SFT) cold-start phase to specifically learn a multi-turn tool-use format. This training enables the model to generate Python code, process interpreter outputs, and conclude with a boxed answer, following the <code> / <interpreter> / \boxed{} structure.

Key Capabilities

  • Tool-Use Format Learning: Teaches the model to interact with tools using a structured multi-turn format.
  • Code Generation: Capable of writing Python code within the tool-use context.
  • Output Interpretation: Designed to read and utilize sandbox output from code execution.
  • Structured Answering: Formats final answers within a \boxed{} structure.

Training Details

This model was trained using the slime v0.3.0 framework on the JoeYing/ReTool-SFT dataset, which comprises 2,000 multi-turn {messages} trajectories. The training involved 3 epochs with an Adam optimizer and a learning rate of 1e-5, achieving a final train loss of approximately 0.37–0.40.

Good For

  • Further RL Stages: Intended as the --hf-checkpoint or MODEL_PATH for subsequent GRPO / GFlowRL stages within the retool-slime pipeline.
  • Tool-Augmented Math Inference: Directly applicable for mathematical problem-solving that requires tool interaction using the specified ReTool <code>/<interpreter> format.