daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 22, 2026Architecture:Transformer0.0K Warm
The daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RL is a 4 billion parameter instruction-tuned language model, based on Qwen/Qwen3-4B-Instruct-2507, specifically fine-tuned using Reinforcement Learning (RL) within the LLM-in-Sandbox framework. This model is designed to elicit general agentic intelligence, making it particularly adept at tasks requiring autonomous decision-making and interaction within simulated environments. With a 32768 token context length, it is optimized for complex agentic applications and research into AI behavior.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–