Ornstein3.6-35B-A3B: Multimodal Reasoning MoE

Ornstein3.6-35B-A3B is a multimodal fine-tune of the Qwen 3.6 35B-A3B Mixture-of-Experts (MoE) model, developed by DJLougen. This model is part of the "Ornstein" series, which focuses on reasoning- and agent-oriented fine-tunes built upon a custom data curation pipeline. It leverages a Qwen 3.6 MoE architecture with 34.66 billion total parameters, where approximately 3 billion are active per token, and boasts an extensive context length of 262,144 tokens.

Key Capabilities

Multimodal Conditional Generation: The model includes the Qwen3.6 base visual tower and image/video processor files, enabling it to handle image-text-to-text tasks.
Mixture-of-Experts Architecture: Utilizes a Qwen 3.6 MoE with linear + full attention interleaved (Gated Delta Net), featuring 256 experts with 8 active per token, enhancing efficiency and performance.
Reasoning and Agent-Oriented Tasks: Specifically fine-tuned for applications requiring advanced reasoning and agentic behaviors.
High Context Length: Supports a substantial context window of 262,144 tokens, beneficial for processing long inputs and complex interactions.

Good For

Developers working on multimodal AI applications that require processing both text and image inputs.
Use cases demanding strong reasoning capabilities and agent-like intelligence.
Applications benefiting from a large context window for intricate problem-solving or extended dialogues.
Users seeking an efficient MoE model for conditional generation tasks.

Overview

Ornstein3.6-35B-A3B: Multimodal Reasoning MoE

Key Capabilities

Good For

Full Model Card (README)