verque/Nemotron-Orchestrator-8B-mlx-fp16
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 2, 2026Architecture:Transformer Cold

The Nemotron-Orchestrator-8B-mlx-fp16 is an 8 billion parameter causal language model, converted by verque from NVIDIA's Nemotron-Orchestrator-8B to the MLX format. This model is specifically designed for efficient inference on Apple Silicon, leveraging the MLX framework. It maintains a substantial context length of 32768 tokens, making it suitable for tasks requiring extensive contextual understanding and generation.

Loading preview...