BillyWang1/qwen2.5-7b-base-retool-slime-sft
The BillyWang1/qwen2.5-7b-base-retool-slime-sft model is a 7.6 billion parameter Qwen2.5 base model, fine-tuned for multi-turn tool-use capabilities. It specializes in generating Python code, interpreting sandbox output, and providing boxed answers, making it suitable for tool-augmented mathematical inference. This model serves as the initial SFT cold-start stage for the retool-slime pipeline, teaching the specific `` / `` / `\boxed{}` format.
Loading preview...
Overview
The BillyWang1/qwen2.5-7b-base-retool-slime-sft model is a 7.6 billion parameter variant of the Qwen2.5 base architecture. It has undergone a Supervised Fine-Tuning (SFT) cold-start phase to specifically learn a multi-turn tool-use format. This training enables the model to generate Python code, process interpreter outputs, and conclude with a boxed answer, following the <code> / <interpreter> / \boxed{} structure.
Key Capabilities
- Tool-Use Format Learning: Teaches the model to interact with tools using a structured multi-turn format.
- Code Generation: Capable of writing Python code within the tool-use context.
- Output Interpretation: Designed to read and utilize sandbox output from code execution.
- Structured Answering: Formats final answers within a
\boxed{}structure.
Training Details
This model was trained using the slime v0.3.0 framework on the JoeYing/ReTool-SFT dataset, which comprises 2,000 multi-turn {messages} trajectories. The training involved 3 epochs with an Adam optimizer and a learning rate of 1e-5, achieving a final train loss of approximately 0.37–0.40.
Good For
- Further RL Stages: Intended as the
--hf-checkpointorMODEL_PATHfor subsequent GRPO / GFlowRL stages within theretool-slimepipeline. - Tool-Augmented Math Inference: Directly applicable for mathematical problem-solving that requires tool interaction using the specified ReTool
<code>/<interpreter>format.