Name: Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using mergekit and specifically employs the DARE TIES merge method, as detailed in the DARE TIES paper.

Merge Details

The base model for this merge was deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Two distinct pre-trained language models were combined to create this merged model:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface

Each of these constituent models was assigned a weight of 0.5 and a density of 0.5 during the merging process. The configuration also specified normalize: true and dtype: bfloat16.

Key Characteristics

Parameter Count: 1.5 billion parameters.
Context Length: Supports a context window of 131072 tokens.
Merge Method: Utilizes the DARE TIES method for combining model weights, which is designed to improve performance by selectively merging parameters.

Potential Use Cases

Given its architecture and the DARE TIES merging approach, this model is likely optimized for tasks where the specific characteristics of the merged components are beneficial. Its large context window makes it suitable for applications requiring extensive text processing and understanding.

Overview

Model Overview

Merge Details

Key Characteristics

Potential Use Cases

Full Model Card (README)