What is SIRL-Gurobi?
SIRL-Gurobi is a 7.6 billion parameter language model based on the Qwen2.5 architecture, developed by chenyitian-shanshu. It utilizes Solver-Informed Reinforcement Learning (SIRL), a novel paradigm that integrates feedback from optimization solvers to enhance its ability to generate accurate mathematical formulations and code from natural language descriptions. This approach marks the first application of Reinforcement Learning with Verifiable Reward (RLVR) in optimization modeling.
Key Capabilities
- Optimization Modeling: Translates natural language problem descriptions into precise mathematical formulations and executable code, specifically for the Gurobi optimization solver.
- Enhanced Accuracy: Leverages solver outputs for iterative refinement, leading to improved performance on complex optimization tasks.
- Benchmark Performance: The SIRL framework, particularly the 32B variant, has shown performance surpassing DeepSeek-V3 and OpenAI-O3 on various optimization modeling benchmarks, including NL4OPT, MAMO, IndustryOR, and OptMATH.
When to Use SIRL-Gurobi
- Automated Optimization Code Generation: Ideal for developers and researchers needing to quickly generate Gurobi-compatible code for optimization problems described in natural language.
- Complex Problem Solving: Suitable for tackling intricate optimization challenges where accurate mathematical formulation is critical.
- Research in LLM-based Optimization: Provides a strong baseline and framework for further research into grounding LLMs for authentic optimization modeling.