Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties_density0.2
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties_density0.2 is a 1.5 billion parameter language model merged using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model combines two fine-tuned actors, acc_MRL4096_ROLLOUT4_LR2e-6 and accfmt_MRL4096_ROLLOUT4_LR2e-6, with a context length of 131072 tokens. It is designed to leverage the strengths of its constituent models for enhanced performance in specific tasks related to its merged components.
Loading preview...
Model Overview
This model, merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties_density0.2, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES merge method from mergekit, combining multiple pre-trained language models into a single, optimized entity. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, providing a strong foundation for the merged architecture.
Key Characteristics
- Merge Method: Utilizes the TIES (Trimming, Iterative, and Selective) merging technique, which is designed to efficiently combine the knowledge from different models.
- Base Model: Built upon
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, inheriting its foundational capabilities. - Constituent Models: The merge incorporates two specific actor models:
acc_MRL4096_ROLLOUT4_LR2e-6andaccfmt_MRL4096_ROLLOUT4_LR2e-6, each contributing with a weight of 0.5 and a density of 0.2. - Context Length: Supports a substantial context window of 131072 tokens, allowing for processing and generating longer sequences of text.
Potential Use Cases
Given its merge-based construction from specialized actor models, this model is likely suitable for applications that benefit from the combined expertise of its components. Developers interested in exploring the effects of TIES merging on specific fine-tuned models, particularly those derived from the DeepSeek-R1-Distill-Qwen-1.5B family, may find this model valuable for research and development.