laion/rl_r2egym-full_terminus-structured
The laion/rl_r2egym-full_terminus-structured model is an 8 billion parameter Qwen3-based language model, developed by laion, specifically trained with Reinforcement Learning (RL) for structured tool calls. It is optimized for complex software engineering tasks, demonstrating a 42% pass@3 on SWEBench-100. This model excels at automating development workflows by integrating bash, view, edit, create, and search tools within a 32k token context window.
Loading preview...
Overview
laion/rl_r2egym-full_terminus-structured is an 8 billion parameter model based on Qwen3, developed by laion. It has undergone extensive Reinforcement Learning (RL) training using the full r2egym dataset, specifically focusing on enabling structured tool calls. This model is a continuation of laion/r2egym-nl2bash-stack-bugsseq-fixthink-again, further enhancing its capabilities in automated software development.
Key Capabilities
- Structured Tool Calls: Integrates and utilizes structured tools for
bash,view,edit,create, andsearch, allowing for direct interaction with development environments. - Software Engineering Tasks: Achieves a 42% pass@3 on the challenging SWEBench-100 benchmark, indicating strong performance in automated code repair and development tasks.
- RL-Optimized: Trained with the
rloo-nmethod using theterminus-structuredagent, which is designed to improve task completion through iterative learning. - Extended Context Window: Supports a 32k token context (24k input + 8k output), crucial for handling large codebases and complex problem descriptions.
Good For
- Automated Code Development: Ideal for tasks requiring automated code generation, debugging, and modification within a structured environment.
- Software Engineering Research: Useful for researchers exploring RL applications in software development and tool-augmented language models.
- DevOps Automation: Can be integrated into workflows that require intelligent agents to interact with system shells and file systems for maintenance or deployment tasks.