nvidia/Qwen3-Nemotron-32B-RLBFF

Warm
Public
32B
FP8
32768
Oct 12, 2025
License: nvidia-open-model-license
Hugging Face
Overview

Model Overview

The nvidia/Qwen3-Nemotron-32B-RLBFF is a 32 billion parameter large language model developed by NVIDIA, based on the Qwen/Qwen3-32B architecture. This research model is specifically fine-tuned using Reinforcement Learning from Binary Flexible Feedback (RLBFF) to significantly improve the quality of its responses, particularly in conversational contexts. It is designed to generate coherent and high-quality replies to the final user turn in a multi-turn conversation.

Key Capabilities & Performance

  • Enhanced Response Quality: Fine-tuned with RLBFF to produce superior LLM-generated responses.
  • Strong Benchmark Performance: Achieves 55.6% on Arena Hard V2, 70.33% on WildBench, and 9.50 on MT Bench, outperforming the base Qwen3-32B model and showing comparable performance to models like DeepSeek R1 and O3-mini at a fraction of the inference cost.
  • Context Length: Supports a maximum input of 128k tokens, though it was trained on conversations up to 4K tokens.
  • Research Focus: Released to support the research paper on RLBFF (arXiv:2509.21319).

Use Cases

  • Conversational AI: Ideal for generating responses in multi-turn dialogues.
  • Research & Development: Suitable for researchers exploring advanced fine-tuning techniques and model performance improvements.

This model is optimized for NVIDIA GPU-accelerated systems, leveraging hardware and software frameworks like CUDA for faster inference.