Athkal/model-sft-dare

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026Architecture:Transformer Warm

Athkal/model-sft-dare is a merged language model created by Athkal using the Linear DARE method, based on Qwen/Qwen2.5-1.5B-Instruct. This model integrates a fine-tuned component from '/kaggle/working/model_sft_lora' to enhance specific capabilities. It is designed for tasks benefiting from a merged architecture, leveraging the strengths of its constituent models.

Loading preview...

Model Overview

Athkal/model-sft-dare is a merged language model developed by Athkal, utilizing the mergekit tool. This model was constructed using the Linear DARE merge method, as described in the paper "DARE: A Data-free Approach to Merging LLMs" (arXiv:2311.03099).

Key Characteristics

  • Base Model: The merging process started with Qwen/Qwen2.5-1.5B-Instruct, a 1.5 billion parameter instruction-tuned model from Qwen.
  • Merged Component: It incorporates a fine-tuned component identified as /kaggle/working/model_sft_lora, suggesting an integration of specific learned weights.
  • Merge Method: The use of the Linear DARE method implies a focus on efficiently combining model weights, potentially to preserve performance while integrating new capabilities.

Potential Use Cases

  • Specialized Instruction Following: Given its base in an instruction-tuned model and the integration of a fine-tuned component, it is likely optimized for specific instruction-based tasks.
  • Research into Model Merging: This model serves as an example of applying the DARE merging technique, which can be valuable for researchers exploring efficient ways to combine LLMs without extensive retraining.
  • Applications requiring a compact, merged model: The 1.5B parameter base suggests suitability for scenarios where computational resources are a consideration, while still benefiting from merged enhancements.