Name: Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR2e-6_w0.5_ties is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the TIES merge method from mergekit, leveraging deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B as its foundational base model. This merging technique combines the strengths of multiple pre-trained models into a single, more capable entity.

Merge Details

The model integrates two specific pre-trained components, each contributing to its overall capabilities:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface

The merge configuration utilized a weight of 0.5 and a density of 0.5 for each component, with normalization enabled and bfloat16 as the data type. This precise merging strategy aims to balance the contributions of the constituent models.

Key Characteristics

Architecture: Merged model based on DeepSeek-R1-Distill-Qwen-1.5B.
Parameter Count: 1.5 billion parameters.
Context Length: Supports an extensive context window of 131072 tokens, enabling processing of very long inputs and generating coherent, contextually relevant outputs over extended sequences.

Potential Use Cases

Given its large context window and merged nature, this model is suitable for applications requiring:

Long-form content generation: Summarization, document analysis, or creative writing over extensive texts.
Complex reasoning: Tasks that benefit from processing a broad range of information simultaneously.
Specialized language tasks: Where the specific characteristics inherited from its merged components provide an advantage.

Overview

Model Overview

Merge Details

Key Characteristics

Potential Use Cases

Full Model Card (README)