Name: nvidia/DLER-Llama-Nemotron-8B-Merge-Research API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: nvidia

Model Overview

The nvidia/DLER-Llama-Nemotron-8B-Merge-Research is an 8 billion parameter open-weight reasoning model with a 32,768 token context length, developed by NVIDIA. It is engineered for high efficiency in complex analytical tasks, including mathematics, programming, and scientific problem-solving. The model was initially trained using the DLER algorithm on the agentica-org/DeepScaleR-Preview-Dataset and subsequently enhanced through a weight-merging technique to integrate with the base model, mitigating potential accuracy degradation.

Key Differentiators & Performance

This model's primary innovation lies in its efficiency gains. Compared to the Llama-3.1-Nemotron-8B model, DLER-Llama-Nemotron-8B-Merge achieves a substantial reduction in average response length, nearly 50% shorter across diverse mathematical benchmarks, without compromising accuracy. For instance, in evaluations, it maintained comparable accuracy on MATH (95.2% vs 95.4%), AIME (66.7% vs 66.4%), AMC (89.23% vs 88.25%), Minerva (53.19% vs 52.38%), and Olympiad (65.39% vs 64.33%), while significantly reducing the total average response length from 5996 to 3237 tokens.

Intended Use

This model is specifically released for research and development purposes, targeting applications where efficient and accurate reasoning is critical. Developers can leverage its optimized performance for tasks requiring detailed logical thought processes with reduced computational overhead.

Overview

Model Overview

Key Differentiators & Performance

Intended Use

Full Model Card (README)