Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_dare_ties, is a 1.5 billion parameter language model merge. It was constructed using the DARE TIES merging method, a technique designed to combine the strengths of multiple pre-trained models. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

Merge Details

The merge process involved two distinct pre-trained language models, specifically:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

These models were combined with a weight and density of 0.5 each, and the merge was configured to normalize parameters. The entire process utilized mergekit for its implementation, resulting in a model that integrates features from its constituent parts.

Potential Use Cases

Given its origin as a merge of specialized models, this model could be suitable for:

Applications where a compact 1.5B parameter model is preferred.
Scenarios benefiting from the combined capabilities of its merged components.
Research into model merging techniques, particularly DARE TIES.

Overview

Model Overview

Merge Details

Potential Use Cases

Full Model Card (README)