Name: Yano/exp-0223-027-realobs-llmagent-qwen2.5-7b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Yano

Overview

Yano/exp-0223-027-realobs-llmagent-qwen2.5-7b is a 7.6 billion parameter model fine-tuned from Qwen/Qwen2.5-7B-Instruct using QLoRA (4-bit, Unsloth). Its primary purpose is to enhance agentic behavior within the ALFWorld environment by combining real observational data with sophisticated LLM-generated strategic responses.

Key Design Principles

Real Environment Observations: Utilizes byte-exact preservation of real data from experiment 016 for environment responses.
LLM-Renarrated Agent Responses: Agent responses are generated by the LLM, focusing on strategic THOUGHT processes and an ACTION-dominant format, rather than templated replies.
Structured Response Format: Employs a specific format where the first turn includes THOUGHT+ACTION, subsequent turns are ACTION-only, and a recovery THOUGHT is introduced after failures.
Failure Pattern Integration: Naturally incorporates failure patterns observed in 72% of trajectories from the source experiment 016, allowing for more robust agent behavior.

Training Configuration

Base Model: Qwen/Qwen2.5-7B-Instruct
Method: QLoRA (4-bit), merged to 16-bit
Max Sequence Length: 2048 tokens
Epochs: 3
Learning Rate: 2e-05
LoRA Parameters: r=64, alpha=128
Collator: AllAssistantTurnsCollator, ensuring all agent turns are supervised during training.

Good For

Developing and testing LLM agents in simulated environments like ALFWorld.
Research into integrating real-world observations with strategic LLM planning.
Creating agents that can articulate their thought processes and recover from failures.