Name: Zachary1150/merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.1_linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Overview

This model, merge_cosfmt_MRL4096_ROLLOUT4_LR2e-6_w0.1_linear, is a 1.5 billion parameter language model developed by Zachary1150. It was constructed using the mergekit tool, specifically employing the Linear merge method to combine the strengths of two distinct pre-trained language models. The merge process involved assigning different weights to each base model, with one contributing 10% and the other 90% of its parameters, as defined by a bfloat16 configuration.

Key Characteristics

Merge Method: Utilizes the Linear merge technique, as described in the arXiv paper.
Base Models: Composed from two internal models: /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/cos_MRL4096_ROLLOUT4_LR2e-6/global_step_40/actor/huggingface and /local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface.
Parameter Weighting: The merge configuration applied a weight of 0.1 to the first model and 0.9 to the second, with normalization enabled.
Context Length: Features a notable context window of 131072 tokens, allowing for processing and understanding of very long inputs.

Potential Use Cases

Given its merged nature and large context window, this model is likely suitable for applications that benefit from a blend of capabilities from its constituent models and require extensive contextual understanding, such as:

Long-form content generation and summarization.
Advanced reasoning over large documents.
Tasks where combining specific strengths of different models is advantageous.

Overview

Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)