CoconutEmb/SFT-Qwen2.5-1.5B-Instruct-TongSearch
CoconutEmb/SFT-Qwen2.5-1.5B-Instruct-TongSearch is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned by CoconutEmb from the Qwen/Qwen2.5-1.5B-Instruct architecture. This model has a context length of 32768 tokens and is specifically optimized through supervised fine-tuning on the TongSearch_Coconut@16_v2 dataset. It is intended for applications requiring a compact yet capable model for tasks aligned with its specialized training data.
Loading preview...
Model Overview
This model, CoconutEmb/SFT-Qwen2.5-1.5B-Instruct-TongSearch, is a supervised fine-tuned (SFT) version of the Qwen/Qwen2.5-1.5B-Instruct base model. It features 1.5 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs.
Key Characteristics
- Base Model: Qwen2.5-1.5B-Instruct, a robust causal language model.
- Fine-tuning Dataset: Specifically trained on the
TongSearch_Coconut@16_v2dataset, indicating a specialization towards tasks related to this data. - Training Configuration: Utilized a learning rate of 5e-05, a total batch size of 256 across 64 devices, and trained for 3 epochs with a cosine learning rate scheduler.
Intended Use Cases
Given its fine-tuning on a specific dataset, this model is best suited for:
- Applications that align with the characteristics and domain of the
TongSearch_Coconut@16_v2dataset. - Scenarios requiring a relatively small (1.5B parameters) yet instruction-following model with a large context window.
Further details regarding specific intended uses, limitations, and comprehensive training/evaluation data are not provided in the original model card.