Name: daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: daixuancheng

Model Overview

This model, daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL, is a specialized checkpoint derived from the Qwen/Qwen3-4B-Instruct-2507 base model. It has been fine-tuned by Daixuan Cheng using a Reinforcement Learning (RL) approach within the "LLM-in-Sandbox" framework, as detailed in the paper "Computer Environments Elicit General Agentic Intelligence in LLMs".

Key Capabilities

Enhanced Agentic Intelligence: Specifically trained to improve an LLM's ability to function as an autonomous agent within computer environments.
Reinforcement Learning Fine-tuning: Utilizes RL techniques to optimize performance in interactive, sandbox-like settings.
Based on Qwen3-4B-Instruct: Leverages the foundational capabilities of the Qwen3-4B-Instruct architecture.

Use Cases

Agent-based Systems: Ideal for developing and experimenting with LLM agents that need to interact with and navigate computational environments.
Research in Agentic AI: Useful for researchers exploring general agentic intelligence and RL-based fine-tuning for LLMs.
Reproducing Paper Results: Can be used to reproduce the findings and performance described in the associated research paper.

Technical Details

The training data for this model is available as the llm-in-sandbox-rl dataset, and the training code can be found at llm-in-sandbox-rl code. Inference can be performed using vllm with specific configurations for tool choice and caching.

Overview

Model Overview

Key Capabilities

Use Cases

Technical Details

Full Model Card (README)