CloudEdge GRPO Controller Overview

This model, developed by Kartik Raut for the Meta PyTorch OpenEnv Hackathon, is a specialized controller built upon the Qwen2.5-0.5B-Instruct base model. It is fine-tuned using Group Relative Policy Optimization (GRPO) via the TRL library to manage cloud infrastructure crises within the CloudEdge simulator.

Key Capabilities

Autonomous Crisis Management: Learns to select optimal infrastructure actions like scale_up, scale_down, optimize_energy, and migrate_region.
Multi-Objective Balancing: Effectively balances competing objectives for cloud infrastructure: maintaining latency < 150ms, cost < $400/hr, and carbon < 220 units.
Reinforcement Learning: Utilizes a shaped multi-objective reward function, including gap closure and a worst-metric bonus, to guide policy learning. The model converged to prioritize optimize_energy as the most effective action in crisis states.

Training Details

The model underwent 512 training steps, generating 4 responses per prompt, and was trained on a Google Colab T4 GPU in approximately 15 minutes. Its architecture is Qwen2 (0.5B parameters) and it operates within an OpenEnv-compatible Gymnasium-style simulator.

Good for

Research and development in multi-agent interactions and long-horizon planning for cloud resource management.
Demonstrating the application of reinforcement learning (GRPO) to complex, multi-objective control problems.
Exploring sustainable cloud infrastructure management strategies through autonomous agents.

Overview

CloudEdge GRPO Controller Overview

Key Capabilities

Training Details

Good for

Full Model Card (README)