Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties
Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties is a 1.5 billion parameter language model created by Zachary1150. This model is a merge of pre-trained language models, specifically using the DARE TIES merge method with deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base. It integrates components from two distinct actor models, making it suitable for tasks benefiting from combined model strengths.
Loading preview...
Model Overview
Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the mergekit tool, specifically employing the DARE TIES merge method.
Merge Details
This model's foundation is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It integrates two distinct actor models, combined with equal weighting (0.5) and density (0.5) parameters. The merge process utilized a bfloat16 data type and included normalization.
Key Characteristics
- Merge Method: DARE TIES, known for its ability to combine models effectively while preserving performance.
- Base Model: Built upon
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, providing a strong foundational architecture. - Component Models: Incorporates two specific actor models, suggesting an optimization for tasks where their combined expertise is beneficial.
Potential Use Cases
Given its merged nature, this model is likely suitable for applications requiring a blend of capabilities from its constituent models, potentially offering improved performance on specific tasks compared to its individual components. Its 1.5B parameter size makes it efficient for deployment while still offering robust language understanding and generation.