Name: Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties_density0.2, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the TIES (Trimming, Iterative Retraining, and Sparsity) merge method, which combines multiple pre-trained models into a single, more capable model. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, indicating its foundation in the Qwen architecture.

Merge Details

The merge process involved two specific fine-tuned models, both identified as 'actor' checkpoints from distinct training runs:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Each of these models contributed with a weight of 0.5 and a density of 0.2 during the TIES merge, as configured in the provided YAML. This specific merging strategy aims to consolidate the strengths of the individual models while maintaining efficiency. The model supports a substantial context length of 131072 tokens.

Potential Use Cases

Given its origin as a merge of fine-tuned 'actor' models, this model is likely suitable for tasks where the combined capabilities of its constituent models are beneficial. Developers looking for a compact yet capable model derived from the DeepSeek-R1-Distill-Qwen-1.5B base, enhanced through a TIES merge, may find this model useful for various language generation and understanding tasks.

Overview

Model Overview

Merge Details

Potential Use Cases

Full Model Card (README)