Name: BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: BillyWang1

Overview

This model, BillyWang1/qwen2.5-7b-base-retool-slime-sft-v2, is an intermediate checkpoint derived from the Qwen/Qwen2.5-7B base model. It has undergone a Supervised Fine-Tuning (SFT) process using the slime framework, specifically targeting the ReTool interleaved code/tool-call format. This SFT stage is crucial for teaching the base model the structured interaction patterns required for tool use before further Reinforcement Learning (RL).

Key Training Details

Base Model: Qwen/Qwen2.5-7B (base, not instruct-tuned).
Dataset: Trained on the ReTool-SFT dataset, which is formatted for messages.
Epochs: Trained for 6 epochs, resulting in approximately 371 optimizer steps over the dataset.
Precision: Utilizes bf16 precision for training, with gradients all-reduced in fp32.
Hardware: Training was conducted on 8x A100-40GB GPUs.

Intended Use

This model is not designed for direct end-user inference in its current form. Instead, it functions as a cold-start checkpoint for the ReTool pipeline. Its primary purpose is to provide a well-initialized model that has learned the fundamental ReTool tool-calling syntax, making it suitable for subsequent advanced training stages like GRPO or GFlow-RL.

Overview

Overview

Key Training Details

Intended Use

Full Model Card (README)