SERA-14B: An Open-Source Coding Agent
SERA-14B is a 14 billion parameter model from Allen Institute for AI's (Ai2) Open Coding Agents series, built upon the Qwen 3-14B base model. It is specifically designed and fine-tuned for automated software engineering tasks, demonstrating strong performance in resolving complex coding problems.
Key Capabilities and Performance
- High Performance on SWE-bench: Achieves a 41.7% resolve rate on the SWE-bench Verified benchmark, a leading metric for evaluating coding agents. This performance surpasses or matches several larger and comparable models.
- Extensive Context Length: Supports a 32K token context length, enabling it to handle large codebases and complex problem descriptions.
- Synthetic Data Training: Trained on 25,000 synthetic coding agent trajectories generated using Soft Verified Generation (SVG), a novel method that removes the need for test execution during data creation.
- Teacher Model Guidance: Utilizes GLM-4.6 (357B) as a teacher model for generating high-quality training data.
Intended Use Cases
- Automated Software Engineering: Ideal for tasks such as bug fixing, implementing new features, and code refactoring.
- Repository Specialization: Can be fine-tuned on private codebases to create highly specialized coding agents.
- Research: Valuable for studying coding agents, data generation techniques, and agent behavior.
Limitations
- Primarily validated on SWE-bench Verified (Python repositories); performance on other languages is not guaranteed.
- May generate insecure or incorrect code, requiring human review and testing.
- Performance is largely bounded by the capabilities of its teacher model, GLM-4.6.