issai/foggen
issai/foggen is a 0.8 billion parameter Qwen3-0.6B-based edge LLM designed for self-aware edge-cloud routing. It emits a calibrated verbalized confidence score before its answer, enabling efficient local processing or deferral to a stronger cloud model without an external router. Trained across seven domains including finance, coding, and medical, FogGen optimizes system accuracy by routing approximately 22% of queries to the cloud at a 0.5 confidence threshold.
Loading preview...
What is FogGen?
FogGen is a 0.8 billion parameter self-aware edge LLM developed by issai, built upon the Qwen3-0.6B base model. Its core innovation is the ability to emit a calibrated confidence score alongside its answer in a single forward pass. This allows the model to intelligently decide whether to provide a local answer or route the query to a more powerful cloud model, optimizing for both speed and accuracy in edge-cloud deployments.
Key Capabilities
- Self-Aware Routing: Integrates confidence estimation directly into the inference process, eliminating the need for an external router.
- Efficient Resource Utilization: Routes only necessary queries to the cloud, reducing latency and computational costs.
- Self-Evolving Training: Utilizes a unique 14-round sequential training loop (LoRA SFT) where the model self-samples generations to derive confidence buckets and fine-tunes on
(question, confidence, answer)triples. - Domain Specialization: Trained across seven diverse domains: finance, science, coding, law, math, Kazakh culture, and medical.
- High System Accuracy: Achieves a mean system accuracy of 67.8% at a routing threshold (τ) of 0.5, routing only 21.9% of queries to the cloud, demonstrating a +4.6% lift over random routing.
- Superior Performance: Outperforms AutoMix with higher system accuracy, lower cloud routing percentage, and 9x lower per-query inference cost (1 forward pass vs. 9).
Good For
- Edge AI Applications: Ideal for scenarios where local processing is preferred but complex queries require cloud assistance.
- Cost-Sensitive Deployments: Reduces cloud API calls by intelligently filtering queries.
- Real-time Decision Making: Provides fast local responses for high-confidence queries.
- Multi-Domain Question Answering: Excels in specialized domains like finance, coding, and medical, with demonstrated generalization to open-ended tasks like SQuAD and GSM8K.