BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2
BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2 is a 7.6 billion parameter language model, a ReTool SFT cold-start of Qwen/Qwen2.5-7B (base) trained with the slime framework. It is specifically fine-tuned to understand and generate content in the ReTool interleaved code/tool-call format. This model serves as an intermediate checkpoint, designed as a starting point for subsequent Reinforcement Learning (RL) stages within the ReTool pipeline.
Loading preview...
Overview
This model, BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2, is an intermediate checkpoint derived from the Qwen/Qwen2.5-7B base model. It has undergone a Supervised Fine-Tuning (SFT) process using the slime framework, specifically targeting the ReTool interleaved code/tool-call format. This SFT stage is crucial for teaching the base model the structured interaction patterns required for tool use before further Reinforcement Learning (RL).
Key Training Details
- Base Model: Qwen/Qwen2.5-7B (base, not instruct-tuned).
- Dataset: Trained on the ReTool-SFT dataset, which is formatted for
messages. - Epochs: Trained for 6 epochs, resulting in approximately 371 optimizer steps over the dataset.
- Precision: Utilizes bf16 precision for training, with gradients all-reduced in fp32.
- Hardware: Training was conducted on 8x A100-40GB GPUs.
Intended Use
This model is not designed for direct end-user inference in its current form. Instead, it functions as a cold-start checkpoint for the ReTool pipeline. Its primary purpose is to provide a well-initialized model that has learned the fundamental ReTool tool-calling syntax, making it suitable for subsequent advanced training stages like GRPO or GFlow-RL.