Model Overview

This model, qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv4_1to1, is a 7.6 billion parameter variant of Qwen/Qwen2.5-7B-Instruct, specifically fine-tuned for agentic tasks. It integrates capabilities for sequential trajectory planning and structured reasoning, making it suitable for complex agent environments.

Key Capabilities

AgentBench Optimization: Specifically trained on ALFWorld and DBBench datasets for agent trajectory planning and database querying.
Deterministic Action Generation: Optimized for generating consistent and predictable actions.
Reduced Invalid Actions: Aims to minimize the occurrence of incorrect or invalid actions within its target domains.
Supervised Fine-Tuning (SFT): Utilizes LoRA-based SFT, with weights merged into the base model.

Training Details

The model was trained using a 1:1 mix of the u-10bei/sft_alfworld_trajectory_dataset_v5 and u-10bei/dbbench_sft_dataset_react_v4 datasets. Loss was applied exclusively to assistant outputs, and no external datasets were incorporated.

Intended Use Cases

AgentBench Evaluation: Ideal for evaluating agent performance on ALFWorld and DBBench tasks.
Trajectory Learning Research: Useful for research into how language models handle sequential decision-making.
Educational Experiments: Suitable for academic and experimental purposes related to agentic AI.

Limitations

Performance may decline when used outside the specific AgentBench domains it was trained on.
Long-horizon planning is constrained by the model's 32K context length.
The model may still produce invalid actions if faced with significant distribution shifts from its training data.

Overview

Model Overview

Key Capabilities

Training Details

Intended Use Cases

Limitations

Full Model Card (README)