BryanSwk/LaserPipe-7B-SLERP

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 7, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

BryanSwk/LaserPipe-7B-SLERP is a 7 billion parameter language model created by BryanSwk through a SLERP merge of OpenPipe/mistral-ft-optimized-1218 and macadeliccc/WestLake-7B-v2-laser-truthy-dpo. This model leverages the strengths of its constituent models, offering a combined performance profile for general language tasks. It is designed for experimentation with merged model architectures and provides a .gguf Q4_K_M for CPU inference.

Loading preview...

Model Overview

BryanSwk/LaserPipe-7B-SLERP is a 7 billion parameter language model developed by BryanSwk. This model is a product of a SLERP (Spherical Linear Interpolation) merge using the mergekit tool, combining two distinct pre-trained models: OpenPipe/mistral-ft-optimized-1218 and macadeliccc/WestLake-7B-v2-laser-truthy-dpo. The primary purpose of this repository is to serve as a learning and experimentation platform for merged models and GGUF conversions.

Key Characteristics

  • Merged Architecture: Utilizes the SLERP method to blend the weights of two base models, aiming to combine their respective strengths.
  • Constituent Models: Merges OpenPipe/mistral-ft-optimized-1218 and macadeliccc/WestLake-7B-v2-laser-truthy-dpo across all 32 layers.
  • Parameter Configuration: The merge applied specific t parameters for self_attn and mlp layers, indicating a nuanced blending strategy rather than a simple average.
  • CPU Inference Support: A .gguf Q4_K_M quantized version is provided, facilitating efficient inference on CPU hardware.

Use Cases

  • Experimentation: Ideal for researchers and developers interested in exploring the effects and performance of merged language models.
  • CPU-constrained Environments: The included GGUF version makes it suitable for deployment in environments where GPU resources are limited.
  • General Language Tasks: Given its foundation in Mistral-based models, it is expected to perform well across a range of common NLP applications.