Model Overview
This model, aolans/Qwen2.5-7B-Instruct-SDFT-2ep-fp16, is a fine-tuned version of the Qwen/Qwen2.5-7B-Instruct base model, developed by aolans. It has been trained using LoRA and Unsloth, with the adapter weights merged into the base model, and is provided in fp16 precision for direct loading.
Key Capabilities & Training Focus
The primary objective of this model's training was to enhance its performance on multi-turn agent tasks. Specifically, it demonstrates improved capabilities in environments such as ALFWorld (household tasks) and DBBench (database operations). The training methodology applied loss to all assistant turns within multi-turn trajectories, enabling the model to learn crucial aspects like environment observation, action selection, tool use, and error recovery.
Experimental Features
This iteration incorporates experimental training techniques: SDFT (Self-Distillation Enables Continual Learning) and Epiplexity (Rethinking Information for Computationally Bounded Intelligence). While these methods are still under evaluation and refinement, they aim to improve the model's reasoning capabilities. The model was trained for 2 epochs with a maximum sequence length of 4096 tokens.
Good For
- Agentic workflows: Particularly suited for tasks requiring sequential decision-making and interaction with environments.
- Multi-turn task execution: Excels in scenarios where the model needs to process and respond across multiple conversational turns or steps.
- Research into experimental training methods: Provides a practical application of SDFT and Epiplexity for further study.