Model Overview
The sthenno-com/miscii-14b-0218 is a 14.8 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-14B-Instruct architecture. Developed by Sthenno and Jiayu Wang, this model utilizes a sophisticated merging technique to enhance its capabilities.
Key Technical Details
- Base Model: Qwen/Qwen2.5-14B-Instruct, a robust foundation for general-purpose language tasks.
- Development Method: It was developed using Arcee’s MergeKit, specifically employing the Model Stock merge method as described by Jang, Yun, and Han (2024). This method integrates multiple checkpoints from
tempesthenno-sft-0218 and tempesthenno-sft-0218-stage2 into a unified model. - Parameter Count: 14.8 billion parameters, offering a balance between performance and computational efficiency.
- Context Length: Supports a substantial context length of 131,072 tokens, enabling processing of extensive inputs.
Unique Aspects
The model's distinctiveness lies in its Model Stock merge method, which combines various fine-tuned checkpoints (tempesthenno-sft-0218-ckpt60, tempesthenno-sft-0218-ckpt80, tempesthenno-sft-0218-stage2-ckpt40, tempesthenno-sft-0218-stage2-ckpt50, tempesthenno-sft-0218-stage2-ckpt60). This approach aims to consolidate the strengths of different training stages and fine-tuning iterations, potentially leading to a more robust and versatile model for various applications.