Model Overview
This model, qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv4_1to1, is a 7.6 billion parameter variant of Qwen/Qwen2.5-7B-Instruct, specifically fine-tuned for agentic tasks. It integrates capabilities for sequential trajectory planning and structured reasoning, making it suitable for complex agent environments.
Key Capabilities
- AgentBench Optimization: Specifically trained on ALFWorld and DBBench datasets for agent trajectory planning and database querying.
- Deterministic Action Generation: Optimized for generating consistent and predictable actions.
- Reduced Invalid Actions: Aims to minimize the occurrence of incorrect or invalid actions within its target domains.
- Supervised Fine-Tuning (SFT): Utilizes LoRA-based SFT, with weights merged into the base model.
Training Details
The model was trained using a 1:1 mix of the u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4 datasets. Loss was applied exclusively to assistant outputs, and no external datasets were incorporated.
Intended Use Cases
- AgentBench Evaluation: Ideal for evaluating agent performance on ALFWorld and DBBench tasks.
- Trajectory Learning Research: Useful for research into how language models handle sequential decision-making.
- Educational Experiments: Suitable for academic and experimental purposes related to agentic AI.
Limitations
- Performance may decline when used outside the specific AgentBench domains it was trained on.
- Long-horizon planning is constrained by the model's 32K context length.
- The model may still produce invalid actions if faced with significant distribution shifts from its training data.