aolans/Qwen2.5-7B-Instruct-SDFT-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

aolans/Qwen2.5-7B-Instruct-SDFT-fp16 is a 7.6 billion parameter instruction-tuned model based on Qwen/Qwen2.5-7B-Instruct, fine-tuned to enhance multi-turn agent task performance. It is specifically optimized for complex tasks like household automation (ALFWorld) and database operations (DBBench), learning from environment observations, action selection, and tool use. This model incorporates experimental training techniques, SDFT and Epiplexity, aimed at improving reasoning capabilities, and is provided in fp16 format for direct loading.

Loading preview...

Model Overview

aolans/Qwen2.5-7B-Instruct-SDFT-fp16 is a 7.6 billion parameter instruction-tuned model derived from Qwen/Qwen2.5-7B-Instruct. It has been fine-tuned using LoRA (merged into the base model) and is provided in fp16 precision, allowing for direct loading without separate adapter management.

Key Capabilities & Training Focus

This model is specifically trained to improve multi-turn agent task performance, with a strong emphasis on:

  • ALFWorld (household tasks): Enhancing the model's ability to navigate and complete tasks in simulated household environments.
  • DBBench (database operations): Improving proficiency in executing and managing database-related operations.

The training objective applies loss to all assistant turns in a multi-turn trajectory, enabling the model to learn from environment observations, select appropriate actions, utilize tools effectively, and recover from errors.

Experimental Features

This version incorporates experimental training techniques, SDFT (Self-Distillation Enables Continual Learning) and Epiplexity (Rethinking Information for Computationally Bounded Intelligence). While these methods are still under evaluation and refinement, they aim to further enhance the model's reasoning capabilities. The training utilized a maximum sequence length of 4096 tokens over 2 epochs with a learning rate of 2e-06.

Usage

The model can be loaded directly using AutoModelForCausalLM from the Hugging Face transformers library, leveraging torch.float16 for efficient inference.