Overview

JetLLMLite-v1.1-36B-A3B is an open-weight, post-trained Causal Language Model with a Vision Encoder, developed by Qwen. It features a Mixture-of-Experts (MoE) architecture with 35 billion total parameters and 3 billion activated parameters. The model is designed for advanced AI workloads, offering a native context length of 262,144 tokens, which can be extended up to 1,010,000 tokens.

Key Capabilities

MoE-based Architecture: Utilizes 256 experts with 8 routed and 1 shared expert activated, contributing to its efficiency.
Vision-Language Capability: Supports multimodal inputs, enabling applications like multimodal question answering.
Strong Coding & Agentic Performance: Optimized for complex coding tasks, repository-level reasoning, and agentic workflows.
Extended Context Length: Natively handles very long contexts and can be scaled for ultra-long context scenarios.
Broad Compatibility: Integrates seamlessly with Hugging Face Transformers, vLLM, SGLang, and KTransformers.

Intended Use Cases

This model is particularly well-suited for:

Advanced chat and coding assistants.
Repository-level reasoning and agentic workflows.
Multimodal question answering and long-context document understanding.
RAG and tool-using systems, as well as enterprise AI applications.

Hardware Considerations

Due to its 35B total parameters, hardware requirements vary. While heavily quantized local inference might be possible with 24 GB VRAM, 48-80 GB VRAM is more realistic for smoother local development. Production serving, especially for long-context or multimodal workloads, typically requires multi-GPU or high-memory datacenter environments.

Overview

Overview

Key Capabilities

Intended Use Cases

Hardware Considerations

Full Model Card (README)