Phaedrus33/GRPO_final_submission

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Feb 1, 2026Architecture:Transformer Cold

Phaedrus33/GRPO_final_submission is a 32 billion parameter model, fine-tuned from Qwen3-32B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO). Developed by Phaedrus33, this model is specifically designed for 5G network troubleshooting, excelling at root cause analysis by applying structured reasoning over pre-computed metrics. It achieves a 0.9582 score on the Zindi AI Telco Troubleshooting Challenge Phase 2 test set, demonstrating robust performance in specialized technical diagnostics.

Loading preview...

Model Overview

Phaedrus33/GRPO_final_submission is a 32 billion parameter model, fine-tuned from Qwen3-32B, specifically engineered for 5G network troubleshooting and root cause analysis. It employs a unique two-stage fine-tuning pipeline: Supervised Fine-Tuning (SFT) on structured reasoning traces, followed by Group Relative Policy Optimization (GRPO) to enhance accuracy. The model's core strength lies in its ability to apply expert domain knowledge encoded into reasoning traces, rather than relying on general-purpose LLM reasoning from raw data.

Key Capabilities

  • Specialized 5G Troubleshooting: Designed to identify root causes (e.g., excessive downtilt, PCI collision, interference) from drive test metrics.
  • Structured Reasoning: Learns from detailed chain-of-thought traces, enabling it to reproduce and generalize complex diagnostic logic.
  • Robust Data Handling: Utilizes programmatic pre-computation of metrics and header-based table parsing, making it resilient to varying data formats.
  • Reinforcement Learning for Accuracy: GRPO training with asymmetric reward functions prioritizes correct answers over mere format compliance, achieving a 0.9582 leaderboard score on the Phase 2 test set.
  • Hybrid Deployment Potential: While the full 32B model requires significant GPU resources (80GB+ VRAM), its underlying rule-based classifier can operate on edge devices without GPUs for Type A questions.

Good For

  • Automated Telco Diagnostics: Ideal for applications requiring precise and structured root cause analysis in 5G networks.
  • Complex Technical Problem Solving: Suitable for scenarios where domain-specific rules and structured reasoning are critical.
  • Benchmarking Specialized LLMs: Provides a strong baseline for performance in highly specialized technical challenges.