issai/foggen

TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 20, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

issai/foggen is a 0.8 billion parameter Qwen3-0.6B-based edge LLM designed for self-aware edge-cloud routing. It emits a calibrated verbalized confidence score before its answer, enabling efficient local processing or deferral to a stronger cloud model without an external router. Trained across seven domains including finance, coding, and medical, FogGen optimizes system accuracy by routing approximately 22% of queries to the cloud at a 0.5 confidence threshold.

Loading preview...

What is FogGen?

FogGen is a 0.8 billion parameter self-aware edge LLM developed by issai, built upon the Qwen3-0.6B base model. Its core innovation is the ability to emit a calibrated confidence score alongside its answer in a single forward pass. This allows the model to intelligently decide whether to provide a local answer or route the query to a more powerful cloud model, optimizing for both speed and accuracy in edge-cloud deployments.

Key Capabilities

  • Self-Aware Routing: Integrates confidence estimation directly into the inference process, eliminating the need for an external router.
  • Efficient Resource Utilization: Routes only necessary queries to the cloud, reducing latency and computational costs.
  • Self-Evolving Training: Utilizes a unique 14-round sequential training loop (LoRA SFT) where the model self-samples generations to derive confidence buckets and fine-tunes on (question, confidence, answer) triples.
  • Domain Specialization: Trained across seven diverse domains: finance, science, coding, law, math, Kazakh culture, and medical.
  • High System Accuracy: Achieves a mean system accuracy of 67.8% at a routing threshold (τ) of 0.5, routing only 21.9% of queries to the cloud, demonstrating a +4.6% lift over random routing.
  • Superior Performance: Outperforms AutoMix with higher system accuracy, lower cloud routing percentage, and 9x lower per-query inference cost (1 forward pass vs. 9).

Good For

  • Edge AI Applications: Ideal for scenarios where local processing is preferred but complex queries require cloud assistance.
  • Cost-Sensitive Deployments: Reduces cloud API calls by intelligently filtering queries.
  • Real-time Decision Making: Provides fast local responses for high-confidence queries.
  • Multi-Domain Question Answering: Excels in specialized domains like finance, coding, and medical, with demonstrated generalization to open-ended tasks like SQuAD and GSM8K.