Name: Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties_density0.2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, developed by Zachary1150, is a 1.5 billion parameter language model created through a merge of pre-trained models using the DARE TIES method. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base architecture.

Merge Details

The merge process combined two distinct checkpoints:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Both models were assigned a weight of 0.5 and a density of 0.2 during the DARE TIES merge, which also included normalization. The model supports a context length of 131072 tokens and was processed using bfloat16 data type.

Potential Use Cases

Given its merged nature and substantial context window, this model is likely optimized for tasks that benefit from:

Extended context understanding: Processing and generating text over very long documents or conversations.
Specialized capabilities: The specific nature of the merged checkpoints ('len' and 'accfmt') suggests potential optimization for particular aspects of language processing, though further details on these specific optimizations are not provided in the README.