introtollm/qwen2.5-3B-cb-1_1

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026License:otherArchitecture:Transformer Cold

The introtollm/qwen2.5-3B-cb-1_1 model is a fine-tuned version of the Qwen2.5-3B architecture, featuring 3.1 billion parameters and a 32768-token context length. This model has been specifically adapted through fine-tuning on the cb_1_1_50000 dataset. It is designed for tasks benefiting from its base Qwen2.5 capabilities, enhanced by this targeted training.

Loading preview...

Model Overview

The introtollm/qwen2.5-3B-cb-1_1 is a fine-tuned language model based on the Qwen/Qwen2.5-3B architecture. It features approximately 3.1 billion parameters and supports a substantial 32768-token context window, making it suitable for processing longer sequences of text.

Key Characteristics

  • Base Model: Qwen2.5-3B, a robust foundation for general language understanding and generation tasks.
  • Fine-tuning: The model has undergone specific fine-tuning on the cb_1_1_50000 dataset, indicating a specialization for tasks related to the characteristics of this dataset.
  • Training Hyperparameters: Training involved a learning rate of 2e-05, a batch size of 1 (with 8 gradient accumulation steps), and an AdamW optimizer. The training process consisted of 2109 steps with a cosine learning rate scheduler and 42 warmup steps.

Potential Use Cases

Given its fine-tuning on the cb_1_1_50000 dataset, this model is likely best suited for applications that align with the data distribution and tasks represented in that dataset. Developers should evaluate its performance for:

  • Specific text generation tasks where the cb_1_1_50000 dataset provides relevant examples.
  • Applications requiring a model with a 3.1B parameter count and a large context window, offering a balance between performance and computational efficiency.