daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL
The daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL model is a Qwen3-4B-Instruct variant fine-tuned by Daixuan Cheng using Reinforcement Learning (RL) within the LLM-in-Sandbox framework. This model is specifically designed to enhance general agentic intelligence in large language models by training them in computer environments. It focuses on improving an LLM's ability to act as an agent, making it suitable for tasks requiring autonomous decision-making and interaction within simulated or real-world computational settings.
Loading preview...
Model Overview
This model, daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL, is a specialized checkpoint derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned by Daixuan Cheng using a Reinforcement Learning (RL) approach within the "LLM-in-Sandbox" framework, as detailed in the paper "Computer Environments Elicit General Agentic Intelligence in LLMs".
Key Capabilities
- Enhanced Agentic Intelligence: Specifically trained to improve an LLM's ability to function as an autonomous agent within computer environments.
- Reinforcement Learning Fine-tuning: Utilizes RL techniques to optimize performance in interactive, sandbox-like settings.
- Based on Qwen3-4B-Instruct: Leverages the foundational capabilities of the Qwen3-4B-Instruct architecture.
Use Cases
- Agent-based Systems: Ideal for developing and experimenting with LLM agents that need to interact with and navigate computational environments.
- Research in Agentic AI: Useful for researchers exploring general agentic intelligence and RL-based fine-tuning for LLMs.
- Reproducing Paper Results: Can be used to reproduce the findings and performance described in the associated research paper.
Technical Details
The training data for this model is available as the llm-in-sandbox-rl dataset, and the training code can be found at llm-in-sandbox-rl code. Inference can be performed using vllm with specific configurations for tool choice and caching.