Model Overview
Phaedrus33/GRPO_final_submission is a 32 billion parameter model, fine-tuned from Qwen3-32B, specifically engineered for 5G network troubleshooting and root cause analysis. It employs a unique two-stage fine-tuning pipeline: Supervised Fine-Tuning (SFT) on structured reasoning traces, followed by Group Relative Policy Optimization (GRPO) to enhance accuracy. The model's core strength lies in its ability to apply expert domain knowledge encoded into reasoning traces, rather than relying on general-purpose LLM reasoning from raw data.
Key Capabilities
- Specialized 5G Troubleshooting: Designed to identify root causes (e.g., excessive downtilt, PCI collision, interference) from drive test metrics.
- Structured Reasoning: Learns from detailed chain-of-thought traces, enabling it to reproduce and generalize complex diagnostic logic.
- Robust Data Handling: Utilizes programmatic pre-computation of metrics and header-based table parsing, making it resilient to varying data formats.
- Reinforcement Learning for Accuracy: GRPO training with asymmetric reward functions prioritizes correct answers over mere format compliance, achieving a 0.9582 leaderboard score on the Phase 2 test set.
- Hybrid Deployment Potential: While the full 32B model requires significant GPU resources (80GB+ VRAM), its underlying rule-based classifier can operate on edge devices without GPUs for Type A questions.
Good For
- Automated Telco Diagnostics: Ideal for applications requiring precise and structured root cause analysis in 5G networks.
- Complex Technical Problem Solving: Suitable for scenarios where domain-specific rules and structured reasoning are critical.
- Benchmarking Specialized LLMs: Provides a strong baseline for performance in highly specialized technical challenges.