Name: geodesic-research/nemotron_30b_warm_start_sft_200k_think API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: geodesic-research

Model Overview

This model, geodesic-research/nemotron_30b_warm_start_sft_200k_think, is a 30 billion parameter language model developed by Geodesic Research. It is a supervised fine-tuned (SFT) version of the nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 base model, specifically optimized for reasoning-oriented tasks. It utilizes a unique 'think' tokenizer that maintains <think>...</think> reasoning tags when present in the input, distinguishing it from standard instruction-tuned models.

Key Characteristics

Warm-Start Baseline: Serves as a foundational checkpoint for Geodesic Research's SFM and inoculation campaigns, enabling comparable studies across different model sizes and SFT types.
Reasoning-Focused Tokenizer: Employs geodesic-research/nemotron-think-tokenizer which preserves explicit reasoning tags, facilitating models that can articulate their thought processes.
Training Data: Fine-tuned on the geodesic-research/sft-warm-start-200k dataset, comprising 200,000 chat-format conversations (509M tokens) over a single epoch.
Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs and generating more extensive responses.

Use Cases and Limitations

This model is ideal for research into reasoning capabilities and for developing models that require explicit thought processes. It is a strong candidate for applications where understanding and generating structured reasoning is crucial. However, it's important to note that it was trained on a single epoch of 200k examples, offering narrower coverage compared to the upstream NVIDIA instruct release. The Multi-Token-Prediction (MTP) head weights are randomly initialized due to the SFT process, though this does not impact standard inference.

Overview

Model Overview

Key Characteristics

Use Cases and Limitations

Full Model Card (README)