daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 22, 2026Architecture:Transformer0.0K Warm

The daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL is a 4 billion parameter instruction-tuned language model, based on Qwen/Qwen3-4B-Instruct-2507, specifically fine-tuned using Reinforcement Learning (RL) within the LLM-in-Sandbox framework. This model is designed to elicit general agentic intelligence, making it particularly adept at tasks requiring autonomous decision-making and interaction within simulated environments. With a 32768 token context length, it is optimized for complex agentic applications and research into AI behavior.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p