Name: mlfoundations-dev/qwen2-5_openthoughts_2-5k_rewrite_r1_distill_llama70b_16k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Overview

This model, mlfoundations-dev/qwen2-5_openthoughts_2-5k_rewrite_r1_distill_llama70b_16k, is a 7.6 billion parameter language model. It is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model, indicating its foundation in the Qwen2.5 architecture and its instruction-following capabilities.

Key Characteristics

Base Model: Qwen/Qwen2.5-7B-Instruct.
Fine-tuning Dataset: Specifically trained on the mlfoundations-dev/openthoughts_2-5k_rewrite_r1_distill_llama70b_16k dataset.
Parameter Count: 7.6 billion parameters.
Context Length: The model supports a context length of 131,072 tokens, allowing for processing and generating very long sequences.

Training Details

The model was trained with a learning rate of 1e-05, a batch size of 1 (with 3 gradient accumulation steps across 32 devices, totaling an effective batch size of 96), and for 3 epochs. The optimizer used was ADAMW_TORCH with cosine learning rate scheduling and a 0.1 warmup ratio. This fine-tuning process aims to adapt the base Qwen2.5 model to the specific characteristics and patterns present in the openthoughts_2-5k_rewrite_r1_distill_llama70b_16k dataset.

Overview

Overview

Key Characteristics

Training Details

Full Model Card (README)