Name: Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.5_dare_ties, is a 1.5 billion parameter language model created by Zachary1150. It was developed using the DARE TIES merge method, a technique designed to combine the strengths of multiple pre-trained models efficiently. The base model for this merge is deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

Merge Details

The model integrates two distinct components, each contributing 0.5 weight and density, indicating a balanced combination of their learned representations. The specific components merged were:

A model from /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface
A model from /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

This approach aims to leverage specialized capabilities from each source model into a single, more versatile model. The model supports a substantial context length of 131072 tokens and is configured to use bfloat16 precision.

Key Characteristics

Architecture: Merged from DeepSeek-R1-Distill-Qwen-1.5B base.
Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports an extended context of 131072 tokens, suitable for tasks requiring processing long inputs.
Merge Method: Utilizes the DARE TIES method, known for its effectiveness in combining models while preserving performance.

Potential Use Cases

This model is suitable for applications where a compact, yet capable language model with merged expertise is beneficial, particularly those that can leverage its extended context window.

Overview

Model Overview

Merge Details

Key Characteristics

Potential Use Cases

Full Model Card (README)