Name: Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Zachary1150

Model Overview

Zachary1150/merge_accfmt_MRL4096_ROLLOUT4_LR2e-6_w0.7_linear is a 1.5 billion parameter language model developed by Zachary1150. It was created using the MergeKit framework, which allows for the combination of multiple pre-trained language models into a single, unified model. This particular model utilizes a Linear merge method to blend its components.

Merge Details

The model is a result of merging two specific pre-trained checkpoints:

/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/accfmt_MRL4096_ROLLOUT4_LR2e-6/global_step_30/actor/huggingface
/local/scratch/zli2255/workspace/MergeExpert/checkpoints/baselines_openrs/acc_MRL4096_ROLLOUT4_LR2e-6/global_step_54/actor/huggingface

During the merge process, a specific weighting was applied to each constituent model: the acc_MRL4096_ROLLOUT4_LR2e-6 model received a weight of 0.7, while the accfmt_MRL4096_ROLLOUT4_LR2e-6 model received a weight of 0.3. The merge configuration also specified bfloat16 as the dtype and included normalization. This precise weighting and linear combination aim to leverage the strengths of both base models for specific applications.

Potential Use Cases

Given its origin as a merge of specialized checkpoints, this model is likely suitable for:

Research into model merging techniques: Understanding the impact of specific weighting and linear merging on performance.
Applications requiring a blend of capabilities: Where the individual strengths of the merged models are complementary.
Experiments with custom model architectures: For developers looking to fine-tune or build upon a uniquely merged base.

Overview

Model Overview

Merge Details

Potential Use Cases

Full Model Card (README)