kartikraut09/ecocloud-grpo-qwen
The kartikraut09/ecocloud-grpo-qwen model is a 0.5 billion parameter Qwen2.5-Instruct based controller, fine-tuned by Kartik Raut using Group Relative Policy Optimization (GRPO). It is designed to manage cloud infrastructure crises by selecting optimal actions (scale_up, scale_down, optimize_energy, migrate_region) to balance latency, cost, and carbon objectives. This model excels at learning multi-objective policies for autonomous cloud resource management within a simulated environment.
Loading preview...
CloudEdge GRPO Controller Overview
This model, developed by Kartik Raut for the Meta PyTorch OpenEnv Hackathon, is a specialized controller built upon the Qwen2.5-0.5B-Instruct base model. It is fine-tuned using Group Relative Policy Optimization (GRPO) via the TRL library to manage cloud infrastructure crises within the CloudEdge simulator.
Key Capabilities
- Autonomous Crisis Management: Learns to select optimal infrastructure actions like
scale_up,scale_down,optimize_energy, andmigrate_region. - Multi-Objective Balancing: Effectively balances competing objectives for cloud infrastructure: maintaining latency < 150ms, cost < $400/hr, and carbon < 220 units.
- Reinforcement Learning: Utilizes a shaped multi-objective reward function, including gap closure and a worst-metric bonus, to guide policy learning. The model converged to prioritize
optimize_energyas the most effective action in crisis states.
Training Details
The model underwent 512 training steps, generating 4 responses per prompt, and was trained on a Google Colab T4 GPU in approximately 15 minutes. Its architecture is Qwen2 (0.5B parameters) and it operates within an OpenEnv-compatible Gymnasium-style simulator.
Good for
- Research and development in multi-agent interactions and long-horizon planning for cloud resource management.
- Demonstrating the application of reinforcement learning (GRPO) to complex, multi-objective control problems.
- Exploring sustainable cloud infrastructure management strategies through autonomous agents.