Name: Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_ties is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the TIES merge method from mergekit, building upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its base model.

Merge Details

This model is a composite of two distinct fine-tuned models, each contributing with a weight and density of 0.5. The merge configuration utilized a normalized approach with bfloat16 data type. The specific models merged were:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

Potential Use Cases

Given its architecture as a merged model, it is likely optimized for:

Resource-constrained environments: Its 1.5B parameter count makes it suitable for deployment where computational resources are limited.
Specific domain tasks: The underlying merged models suggest potential specialization, making it a candidate for tasks aligned with their original fine-tuning objectives.
Experimentation with merged architectures: Developers interested in exploring the performance characteristics of TIES-merged models based on DeepSeek's distilled Qwen architecture.

Overview

Model Overview

Merge Details

Potential Use Cases

Full Model Card (README)