Jetlink/JetLLMLite-v1.1-36B-A3B

TEXT GENERATIONConcurrency Cost:3Model Size:35.1BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 22, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Jetlink/JetLLMLite-v1.1-36B-A3B is an open-weight, post-trained Causal Language Model with a Vision Encoder, developed by Qwen, featuring 35 billion total parameters and 3 billion activated parameters. This MoE-based model offers strong coding, agentic, and multimodal capabilities, supporting a native context length of 262,144 tokens, extensible up to 1,010,000 tokens. It is optimized for advanced coding, reasoning, long-context understanding, and agentic AI workloads, compatible with popular inference frameworks like Hugging Face Transformers, vLLM, SGLang, and KTransformers.

Loading preview...

Overview

JetLLMLite-v1.1-36B-A3B is an open-weight, post-trained Causal Language Model with a Vision Encoder, developed by Qwen. It features a Mixture-of-Experts (MoE) architecture with 35 billion total parameters and 3 billion activated parameters. The model is designed for advanced AI workloads, offering a native context length of 262,144 tokens, which can be extended up to 1,010,000 tokens.

Key Capabilities

  • MoE-based Architecture: Utilizes 256 experts with 8 routed and 1 shared expert activated, contributing to its efficiency.
  • Vision-Language Capability: Supports multimodal inputs, enabling applications like multimodal question answering.
  • Strong Coding & Agentic Performance: Optimized for complex coding tasks, repository-level reasoning, and agentic workflows.
  • Extended Context Length: Natively handles very long contexts and can be scaled for ultra-long context scenarios.
  • Broad Compatibility: Integrates seamlessly with Hugging Face Transformers, vLLM, SGLang, and KTransformers.

Intended Use Cases

This model is particularly well-suited for:

  • Advanced chat and coding assistants.
  • Repository-level reasoning and agentic workflows.
  • Multimodal question answering and long-context document understanding.
  • RAG and tool-using systems, as well as enterprise AI applications.

Hardware Considerations

Due to its 35B total parameters, hardware requirements vary. While heavily quantized local inference might be possible with 24 GB VRAM, 48-80 GB VRAM is more realistic for smoother local development. Production serving, especially for long-context or multimodal workloads, typically requires multi-GPU or high-memory datacenter environments.