kartikraut09/ecocloud-grpo-qwen

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026License:mitArchitecture:Transformer Open Weights Cold

The kartikraut09/ecocloud-grpo-qwen model is a 0.5 billion parameter Qwen2.5-Instruct based controller, fine-tuned by Kartik Raut using Group Relative Policy Optimization (GRPO). It is designed to manage cloud infrastructure crises by selecting optimal actions (scale_up, scale_down, optimize_energy, migrate_region) to balance latency, cost, and carbon objectives. This model excels at learning multi-objective policies for autonomous cloud resource management within a simulated environment.

Loading preview...

CloudEdge GRPO Controller Overview

This model, developed by Kartik Raut for the Meta PyTorch OpenEnv Hackathon, is a specialized controller built upon the Qwen2.5-0.5B-Instruct base model. It is fine-tuned using Group Relative Policy Optimization (GRPO) via the TRL library to manage cloud infrastructure crises within the CloudEdge simulator.

Key Capabilities

  • Autonomous Crisis Management: Learns to select optimal infrastructure actions like scale_up, scale_down, optimize_energy, and migrate_region.
  • Multi-Objective Balancing: Effectively balances competing objectives for cloud infrastructure: maintaining latency < 150ms, cost < $400/hr, and carbon < 220 units.
  • Reinforcement Learning: Utilizes a shaped multi-objective reward function, including gap closure and a worst-metric bonus, to guide policy learning. The model converged to prioritize optimize_energy as the most effective action in crisis states.

Training Details

The model underwent 512 training steps, generating 4 responses per prompt, and was trained on a Google Colab T4 GPU in approximately 15 minutes. Its architecture is Qwen2 (0.5B parameters) and it operates within an OpenEnv-compatible Gymnasium-style simulator.

Good for

  • Research and development in multi-agent interactions and long-horizon planning for cloud resource management.
  • Demonstrating the application of reinforcement learning (GRPO) to complex, multi-objective control problems.
  • Exploring sustainable cloud infrastructure management strategies through autonomous agents.