mlfoundations-dev/qwen2-5_nemotron-sft_100000

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 25, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/qwen2-5_nemotron-sft_100000 model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained on the mlfoundations-dev/nemotron-sft_100000 dataset, featuring a 32K context length. This model is designed for general language understanding and generation tasks, leveraging its instruction-tuned base for diverse applications.

Loading preview...

Overview

This model, mlfoundations-dev/qwen2-5_nemotron-sft_100000, is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model. It features 7.6 billion parameters and supports a 32,768 token context length, making it suitable for processing moderately long inputs and generating comprehensive responses. The fine-tuning process utilized the mlfoundations-dev/nemotron-sft_100000 dataset, which suggests an optimization for specific instruction-following or conversational capabilities, though further details on the dataset's nature are not provided.

Training Details

The model was trained with a learning rate of 8e-05, a total batch size of 512 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 16 across 32 devices), and a cosine learning rate scheduler with a 0.1 warmup ratio over 3 epochs. The optimizer used was AdamW with default betas and epsilon.

Intended Uses

Given its instruction-tuned base and fine-tuning, this model is generally suitable for a range of natural language processing tasks, including but not limited to:

  • Instruction following
  • Text generation
  • Question answering
  • Summarization

Limitations

Specific limitations are not detailed in the provided information. Users should be aware that, like all large language models, it may exhibit biases present in its training data and can occasionally generate factually incorrect or nonsensical outputs.