alirezashirmarz/NICoLE-LLM
alirezashirmarz/NICoLE-LLM is a compact 1.1 billion parameter language model designed as a controller for congestion-aware RTP/WebRTC adaptive video streaming. It predicts ECN, Current Profile, and Next Profile from RTP packetization and queue telemetry using compact symbolic prompting. Optimized for low-latency inference and edge deployment, this model provides deterministic structured outputs for real-time video encoding adaptation. It features a 2048-token context length and is available in GGUF quantization for efficient CPU inference.
Loading preview...
NICoLE-LLM: Compact Controller for Adaptive Video Streaming
NICoLE-LLM, developed by Alireza Shirmarz, is a 1.1 billion parameter model specifically engineered as a controller for congestion-aware RTP/WebRTC adaptive video streaming. Unlike general-purpose LLMs, NICoLE focuses on a highly specialized task: predicting ECN (Explicit Congestion Notification), Current Profile (CP), and Next Profile (NP) based on RTP packetization and queue telemetry data.
Key Capabilities & Optimizations
- Specialized Prediction: Accurately forecasts network congestion signals and optimal video streaming profiles (e.g., 4K, 1080p, 720p, 360p at various frame rates) for adaptive streaming.
- Compact Symbolic Prompting: Utilizes a unique compact symbolic prompting format (e.g.,
I:PS FS IFGS IFGR CQ LQ E->O:E C N) to significantly reduce prompt tokens, KV-cache usage, and inference latency. - Edge & Low-Latency Deployment: Optimized for low-latency inference and efficient deployment on edge devices, supporting GGUF quantization (Q4_K_M recommended) for CPU inference.
- Deterministic Outputs: Provides consistent and structured outputs crucial for real-time control systems.
- Performance: Achieves 2.91 decisions/sec with a 343 ms response time on 4 CPU threads, demonstrating efficient real-time operation.
Ideal Use Cases
- WebRTC Adaptive Streaming: Enhancing real-time video quality and stability by dynamically adjusting to network conditions.
- Congestion-Aware Video Encoding: Adapting video encoding parameters in real-time to prevent congestion and improve user experience.
- In-Network QoE Optimization: Deploying intelligent agents within the network to optimize Quality of Experience.
- Edge AI Networking: Implementing AI-driven control logic directly at the network edge.
Limitations
It's important to note that NICoLE-LLM was trained specifically under a 40 Mbps bottleneck scenario and is designed for bounded RTP/WebRTC streaming tasks. It is not intended for open-ended conversational generation or general language understanding.