HamadaMayu/qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv4_1to1
HamadaMayu/qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv4_1to1 is a 7.6 billion parameter Qwen2.5-7B-Instruct based model fine-tuned for AgentBench tasks, specifically sequential trajectory planning (ALFWorld) and structured reasoning/database querying (DBBench). This model excels at deterministic action generation and reducing invalid actions within these agentic domains. It is optimized for research and evaluation in trajectory learning and agent behavior.
Loading preview...
Model Overview
This model, qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv4_1to1, is a 7.6 billion parameter variant of Qwen/Qwen2.5-7B-Instruct, specifically fine-tuned for agentic tasks. It integrates capabilities for sequential trajectory planning and structured reasoning, making it suitable for complex agent environments.
Key Capabilities
- AgentBench Optimization: Specifically trained on ALFWorld and DBBench datasets for agent trajectory planning and database querying.
- Deterministic Action Generation: Optimized for generating consistent and predictable actions.
- Reduced Invalid Actions: Aims to minimize the occurrence of incorrect or invalid actions within its target domains.
- Supervised Fine-Tuning (SFT): Utilizes LoRA-based SFT, with weights merged into the base model.
Training Details
The model was trained using a 1:1 mix of the u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4 datasets. Loss was applied exclusively to assistant outputs, and no external datasets were incorporated.
Intended Use Cases
- AgentBench Evaluation: Ideal for evaluating agent performance on ALFWorld and DBBench tasks.
- Trajectory Learning Research: Useful for research into how language models handle sequential decision-making.
- Educational Experiments: Suitable for academic and experimental purposes related to agentic AI.
Limitations
- Performance may decline when used outside the specific AgentBench domains it was trained on.
- Long-horizon planning is constrained by the model's 32K context length.
- The model may still produce invalid actions if faced with significant distribution shifts from its training data.