Name: Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.3_linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, developed by Zachary1150, is a 1.5 billion parameter language model created through a linear merge using the mergekit framework. It combines the strengths of two distinct pre-trained base models: /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface.

Merge Details

The linear merge method was applied with specific weighting parameters:

The first base model contributed with a weight of 0.3.
The second base model contributed with a weight of 0.7.

This configuration aimed to balance the characteristics of the merged components. The model was processed with bfloat16 data type and included normalization during the merge.

Key Characteristics

Merged Architecture: Combines two distinct pre-trained models to potentially inherit diverse capabilities.
Linear Merge Method: Utilizes a straightforward and effective merging technique for model combination.
Large Context Window: Features a context length of 131072 tokens, suitable for processing very long sequences of text.

Potential Use Cases

Given its merged nature and large context, this model could be beneficial for applications requiring:

Comprehensive document analysis.
Long-form content generation or summarization.
Tasks benefiting from a blend of different model specializations.

Overview

Model Overview

Merge Details

Key Characteristics

Potential Use Cases

Full Model Card (README)