Name: Zachary1150/merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

This model, merge_lenfmt_MRL4096_ROLLOUT4_LR5e-7_w0.7_linear, is a 1.5 billion parameter language model developed by Zachary1150. It stands out due to its exceptionally long context length of 131072 tokens, enabling it to process and generate very extensive texts.

Merge Details

The model was constructed using the Linear merge method via mergekit. This technique combines the weights of multiple pre-trained models to create a new, unified model. Specifically, this merge integrated two distinct base models:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/len_MRL4096_ROLLOUT4_LR5e-7/global_step_30/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR5e-7/global_step_54/actor/huggingface

The merge configuration applied a weight of 0.7 to the first model and 0.3 to the second, with normalization enabled and bfloat16 as the data type. This specific weighting suggests an emphasis on the characteristics of the len_MRL4096 base model.

Key Characteristics

Extended Context Window: A primary feature is its 131072-token context length, significantly larger than many models of its size.
Merged Architecture: Benefits from the combined knowledge and capabilities of its constituent base models through a linear merge.

Potential Use Cases

This model is particularly well-suited for applications requiring:

Long-document analysis: Summarization, question-answering, or information extraction from very long texts.
Code generation/understanding: Processing large codebases or complex programming logic.
Creative writing: Generating extensive narratives or detailed scenarios where context retention is crucial.

Overview

Model Overview

Merge Details

Key Characteristics

Potential Use Cases

Full Model Card (README)