Overview
Yano/exp-0223-027-realobs-llmagent-qwen2.5-7b is a 7.6 billion parameter model fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA (4-bit, Unsloth). Its primary purpose is to enhance agentic behavior within the ALFWorld environment by combining real observational data with sophisticated LLM-generated strategic responses.
Key Design Principles
- Real Environment Observations: Utilizes byte-exact preservation of real data from experiment 016 for environment responses.
- LLM-Renarrated Agent Responses: Agent responses are generated by the LLM, focusing on strategic
THOUGHT processes and an ACTION-dominant format, rather than templated replies. - Structured Response Format: Employs a specific format where the first turn includes
THOUGHT+ACTION, subsequent turns are ACTION-only, and a recovery THOUGHT is introduced after failures. - Failure Pattern Integration: Naturally incorporates failure patterns observed in 72% of trajectories from the source experiment 016, allowing for more robust agent behavior.
Training Configuration
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Method: QLoRA (4-bit), merged to 16-bit
- Max Sequence Length: 2048 tokens
- Epochs: 3
- Learning Rate: 2e-05
- LoRA Parameters: r=64, alpha=128
- Collator:
AllAssistantTurnsCollator, ensuring all agent turns are supervised during training.
Good For
- Developing and testing LLM agents in simulated environments like ALFWorld.
- Research into integrating real-world observations with strategic LLM planning.
- Creating agents that can articulate their thought processes and recover from failures.