verque/Nemotron-Orchestrator-8B-mlx-fp16

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 2, 2026Architecture:Transformer Cold

The Nemotron-Orchestrator-8B-mlx-fp16 is an 8 billion parameter causal language model, converted by verque from NVIDIA's Nemotron-Orchestrator-8B to the MLX format. This model is specifically designed for efficient inference on Apple Silicon, leveraging the MLX framework. It maintains a substantial context length of 32768 tokens, making it suitable for tasks requiring extensive contextual understanding and generation.

Loading preview...

Model Overview

The verque/Nemotron-Orchestrator-8B-mlx-fp16 is an 8 billion parameter language model, a conversion of NVIDIA's Nemotron-Orchestrator-8B. This specific version has been adapted to the MLX format using mlx-lm version 0.29.1, making it optimized for efficient execution on Apple Silicon hardware.

Key Capabilities

  • MLX Optimization: Specifically converted for performance on Apple Silicon, enabling local inference with MLX.
  • Large Context Window: Features a 32768-token context length, allowing for processing and generating extensive text.
  • Causal Language Modeling: Inherits the core capabilities of the Nemotron-Orchestrator-8B architecture for text generation and understanding.

Good For

  • Local Development on Apple Silicon: Ideal for developers and researchers looking to run powerful language models directly on their Apple devices.
  • Applications Requiring Long Context: Suitable for tasks like document summarization, extended dialogue, or code analysis where a large context window is beneficial.
  • Experimentation with MLX: Provides a ready-to-use model for exploring the capabilities and performance of the MLX framework.