introtollm/qwen2.5-3B-cb-1_0

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026License:otherArchitecture:Transformer Cold

The introtollm/qwen2.5-3B-cb-1_0 model is a 3.1 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-3B architecture. This model has been specifically adapted using the cb_1_0_50000 dataset. It is designed for tasks benefiting from its specialized fine-tuning, offering a compact yet capable solution for various natural language processing applications.

Loading preview...

Overview

This model, introtollm/qwen2.5-3B-cb-1_0, is a fine-tuned variant of the Qwen/Qwen2.5-3B base model. It features approximately 3.1 billion parameters and was trained with a context length of 32768 tokens. The fine-tuning process utilized the cb_1_0_50000 dataset, suggesting a specialization for tasks related to the characteristics of this dataset.

Training Details

The model was trained with a learning rate of 2e-05 and a total batch size of 8 (achieved with 1 train batch size and 8 gradient accumulation steps). It used the ADAMW_TORCH_FUSED optimizer with a cosine learning rate scheduler over 2109 training steps. This configuration indicates a focused training regimen to adapt the base Qwen model to specific data patterns.

Potential Use Cases

Given its fine-tuned nature, this model is likely suitable for applications where the cb_1_0_50000 dataset's characteristics are relevant. Developers looking for a compact Qwen-based model with specialized knowledge from this dataset may find it useful for tasks such as:

  • Text generation or completion within the domain of the training data.
  • Specific natural language understanding tasks aligned with the fine-tuning.
  • Applications requiring a smaller footprint model with targeted capabilities.