CoconutEmb/SFT-Qwen2.5-1.5B-Instruct-TongSearch

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 23, 2026License:otherArchitecture:Transformer Warm

CoconutEmb/SFT-Qwen2.5-1.5B-Instruct-TongSearch is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned by CoconutEmb from the Qwen/Qwen2.5-1.5B-Instruct architecture. This model has a context length of 32768 tokens and is specifically optimized through supervised fine-tuning on the TongSearch_Coconut@16_v2 dataset. It is intended for applications requiring a compact yet capable model for tasks aligned with its specialized training data.

Loading preview...

Model Overview

This model, CoconutEmb/SFT-Qwen2.5-1.5B-Instruct-TongSearch, is a supervised fine-tuned (SFT) version of the Qwen/Qwen2.5-1.5B-Instruct base model. It features 1.5 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Characteristics

  • Base Model: Qwen2.5-1.5B-Instruct, a robust causal language model.
  • Fine-tuning Dataset: Specifically trained on the TongSearch_Coconut@16_v2 dataset, indicating a specialization towards tasks related to this data.
  • Training Configuration: Utilized a learning rate of 5e-05, a total batch size of 256 across 64 devices, and trained for 3 epochs with a cosine learning rate scheduler.

Intended Use Cases

Given its fine-tuning on a specific dataset, this model is best suited for:

  • Applications that align with the characteristics and domain of the TongSearch_Coconut@16_v2 dataset.
  • Scenarios requiring a relatively small (1.5B parameters) yet instruction-following model with a large context window.

Further details regarding specific intended uses, limitations, and comprehensive training/evaluation data are not provided in the original model card.