mlfoundations-dev/openthoughts3_100k_qwen25_1b_bsz1024_lr2e5_epochs5
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jun 24, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

The mlfoundations-dev/openthoughts3_100k_qwen25_1b_bsz1024_lr2e5_epochs5 model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained on the mlfoundations-dev/openthoughts3_100k dataset, indicating a specialization in processing and generating content related to the dataset's domain. This model is designed for tasks benefiting from its specific fine-tuning on a large, diverse dataset, offering enhanced performance in areas aligned with its training data.

Loading preview...

Model Overview

This model, mlfoundations-dev/openthoughts3_100k_qwen25_1b_bsz1024_lr2e5_epochs5, is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model. It features approximately 1.5 billion parameters and has been specifically adapted through further training on the mlfoundations-dev/openthoughts3_100k dataset.

Training Details

The fine-tuning process involved a learning rate of 2e-05 over 5.0 epochs, utilizing a cosine learning rate scheduler with a 0.1 warmup ratio. The training was distributed across 32 devices with a total batch size of 1024 (achieved with a gradient_accumulation_steps of 8), using the ADAMW_TORCH optimizer. This targeted training on a specific dataset suggests an optimization for tasks and data distributions present within mlfoundations-dev/openthoughts3_100k.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Parameter Count: 1.5 billion
  • Fine-tuning Dataset: mlfoundations-dev/openthoughts3_100k
  • Training Hyperparameters: Optimized with specific learning rate, batch size, and optimizer settings for focused adaptation.

Potential Use Cases

This model is likely best suited for applications where its fine-tuning on the mlfoundations-dev/openthoughts3_100k dataset provides a distinct advantage, potentially excelling in tasks related to the content and structure of that specific data.