Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties
Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties is a 1.5 billion parameter language model created by Zachary1150. This model was produced by merging two pre-trained models using the TIES method, based on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It features a context length of 131072 tokens, making it suitable for tasks requiring extensive context processing.
Loading preview...
Model Overview
This model, merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was created using the mergekit tool, specifically employing the TIES merge method.
Merge Details
The base model for this merge was deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Two distinct pre-trained models were combined:
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
Each merged model contributed with a weight of 0.5 and a density of 0.5, with normalization applied during the merge process. The model was processed using bfloat16 data type.
Key Characteristics
- Merge Method: Utilizes the TIES (Trimmed, Iterative, and Selective) merging approach, which is designed to combine the strengths of multiple models efficiently.
- Base Model: Built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture, inheriting its foundational capabilities.
- Context Length: Features a notable context length of 131072 tokens, allowing for processing and understanding very long inputs.
Potential Use Cases
This model is suitable for applications that can benefit from a compact yet capable language model with a very large context window, especially in scenarios where the specific characteristics of the merged components are advantageous.