allenai/SERA-8B-GA
SERA-8B-GA is an 8 billion parameter open-source coding agent developed by Allen Institute for AI (Ai2), built on the Qwen 3-8B base model and fine-tuned using GLM-4.5-Air as a teacher. It achieves 31.7% on the SWE-bench Verified benchmark at a 32K context length, specializing in automated software engineering tasks like bug fixes, feature implementation, and refactoring. This model is optimized for generating and modifying code within Python repositories.
Loading preview...
SERA-8B-GA: An Open-Source Coding Agent
SERA-8B-GA, developed by the Allen Institute for AI (Ai2), is an 8 billion parameter open-source coding agent. It is the fourth model in Ai2's Open Coding Agents series, built upon the Qwen 3-8B base model and fine-tuned with GLM-4.5-Air (110B) as the teacher model. This model is specifically designed for automated software engineering tasks, leveraging a 32K token context length.
Key Capabilities
- High Performance on SWE-bench: Achieves a resolve rate of 31.7% on SWE-bench Verified, outperforming other 8B open-source models like SkyRL-8B (9.4%) and Nex-N1-8B (20.3%).
- Automated Software Engineering: Excels at tasks such as bug fixes, feature implementation, and code refactoring.
- Synthetic Data Training: Trained on 200,000 synthetic coding agent trajectories generated using Soft Verified Generation (SVG), a two-rollout pipeline that removes the need for test infrastructure.
- CLI Integration: Easily usable via the
seraCLI for seamless integration and deployment.
Good For
- Automated software development: Ideal for automating common coding tasks in Python repositories.
- Repository specialization: Can be fine-tuned on private codebases to create highly specialized coding agents.
- Research: Suitable for studying coding agents, data generation methods, and agent behavior.
Limitations
- Primarily validated on SWE-bench Verified (Python repositories); performance on other languages or benchmarks is unknown.
- Performance is largely bounded by the GLM-4.5-Air teacher model's capabilities.
- May generate insecure or incorrect code; all outputs require human review and testing.