Name: Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_dare_ties, is a 1.5 billion parameter language model developed by Zachary1150. It is built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model and utilizes a substantial context length of 131072 tokens.

Key Characteristics

Merge Method: Created using the DARE TIES merge method, which combines multiple pre-trained models to leverage their individual strengths.
Base Model: Derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, indicating a foundation in a Qwen-based architecture.
Merged Components: Integrates two distinct models, /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface, each contributing with a weight of 0.5.

Potential Use Cases

Given its merged nature and large context window, this model is likely optimized for:

Tasks requiring the synthesis of capabilities from its constituent models.
Applications benefiting from processing and understanding very long texts or complex documents.
Research into model merging techniques and their impact on performance.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)