Name: Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES merge method from the mergekit framework, building upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model.

Merge Details

The model integrates two distinct pre-trained checkpoints, each contributing equally with a weight and density of 0.5. The merged components are:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface

This merging strategy aims to combine the learned representations from these individual models into a unified, more capable language model. The merge process was configured to normalize parameters and utilize bfloat16 data type for efficiency.

Potential Use Cases

Given its foundation on a DeepSeek-R1-Distill-Qwen base and the TIES merging of two actor checkpoints, this model is likely suitable for a range of general-purpose language generation and understanding tasks where a 1.5B parameter model with a 131072 token context length is appropriate.

Overview

Model Overview

Merge Details

Potential Use Cases

Full Model Card (README)