Name: Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, merge_accfmt_MRL4096_ROLLOUT4_LR1e-6_w0.5_ties, is a 1.5 billion parameter language model developed by Zachary1150. It was created using the mergekit tool, specifically employing the TIES merge method.

Merge Details

The base model for this merge was deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Two distinct pre-trained models were combined:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR1e-6/global_step_50/actor/huggingface

Each merged model contributed with a weight of 0.5 and a density of 0.5, with normalization applied during the merge process. The model was processed using bfloat16 data type.

Key Characteristics

Merge Method: Utilizes the TIES (Trimmed, Iterative, and Selective) merging approach, which is designed to combine the strengths of multiple models efficiently.
Base Model: Built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture, inheriting its foundational capabilities.
Context Length: Features a notable context length of 131072 tokens, allowing for processing and understanding very long inputs.

Potential Use Cases

This model is suitable for applications that can benefit from a compact yet capable language model with a very large context window, especially in scenarios where the specific characteristics of the merged components are advantageous.

Overview

Model Overview

Merge Details

Key Characteristics

Potential Use Cases

Full Model Card (README)