Name: fenyo/Qwen2.5-7B-base2instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: fenyo

Model Overview

fenyo/Qwen2.5-7B-base2instruct is a 7.6 billion parameter instruction-tuned model built upon the Qwen2.5-7B base. Its development focused on reproducing the base-to-instruct transformation using a unique SFT → DPO → RLVR pipeline on a single H100 GPU, aiming to derive reusable recipes for instruction tuning.

Key Capabilities & Performance

This model demonstrates strong instruction-following abilities, achieving a score of 75.0 on IFEval, surpassing the official Qwen2.5-7B-Instruct (71.9). It also performs well on general knowledge tasks, scoring 70.2 on MMLU (vs. 68.8 for the official instruct model). While excelling in instruction adherence, it shows slightly lower performance in mathematical reasoning (79.7 on GSM8K vs. 84.7 for the official instruct model).

Training Methodology Highlights

The training process involved three key stages:

Supervised Fine-Tuning (SFT): Used 300k examples from allenai/tulu-3-sft-mixture to teach ChatML format and assistant behavior, significantly improving IFEval from 27.4 to 51.2.
Direct Preference Optimization (DPO): Crucially, a targeted DPO using allenai/tulu-3-pref-personas-instruction-following (focused on instruction adherence) boosted IFEval from 51.2 to 68.9, highlighting the importance of task-specific data.
Reinforcement Learning with Verifiable Rewards (RLVR): Employed the GRPO algorithm with graduated rewards (multi-constraint prompts) to amplify instruction-following, further increasing IFEval to 75.0. This stage leveraged verifiable rewards for maths (GSM8K) and custom instruction constraints.

Use Cases

This model is particularly well-suited for applications requiring precise instruction following and general knowledge tasks. Its specialized training makes it a strong candidate for scenarios where adherence to specific directives is paramount, such as automated content generation with strict guidelines or complex query resolution.

Overview

Model Overview

Key Capabilities & Performance

Training Methodology Highlights

Use Cases

Full Model Card (README)