BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 26, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2 is a 7.6 billion parameter language model, a ReTool SFT cold-start of Qwen/Qwen2.5-7B (base) trained with the slime framework. It is specifically fine-tuned to understand and generate content in the ReTool interleaved code/tool-call format. This model serves as an intermediate checkpoint, designed as a starting point for subsequent Reinforcement Learning (RL) stages within the ReTool pipeline.

Loading preview...

Overview

This model, BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2, is an intermediate checkpoint derived from the Qwen/Qwen2.5-7B base model. It has undergone a Supervised Fine-Tuning (SFT) process using the slime framework, specifically targeting the ReTool interleaved code/tool-call format. This SFT stage is crucial for teaching the base model the structured interaction patterns required for tool use before further Reinforcement Learning (RL).

Key Training Details

  • Base Model: Qwen/Qwen2.5-7B (base, not instruct-tuned).
  • Dataset: Trained on the ReTool-SFT dataset, which is formatted for messages.
  • Epochs: Trained for 6 epochs, resulting in approximately 371 optimizer steps over the dataset.
  • Precision: Utilizes bf16 precision for training, with gradients all-reduced in fp32.
  • Hardware: Training was conducted on 8x A100-40GB GPUs.

Intended Use

This model is not designed for direct end-user inference in its current form. Instead, it functions as a cold-start checkpoint for the ReTool pipeline. Its primary purpose is to provide a well-initialized model that has learned the fundamental ReTool tool-calling syntax, making it suitable for subsequent advanced training stages like GRPO or GFlow-RL.