ryzen88/Llama-3-70b-Uncensored-Lumi-Tess-gradient

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:May 10, 2024Architecture:Transformer0.0K Warm

ryzen88/Llama-3-70b-Uncensored-Lumi-Tess-gradient is a 70 billion parameter language model developed by ryzen88, created by merging Llama-3-70B-Instruct-Gradient-262k, Tess-2.0-Llama-3-70B-v0.2, and Llama-3-Lumimaid-70B-v0.1-OAS using the breadcrumbs_ties method. This model is designed to be an uncensored Llama 3 variant with an emphasis on long context capabilities. It offers a wide optimal sampler range, making it versatile across various inference settings.

Loading preview...

Model Overview

ryzen88/Llama-3-70b-Uncensored-Lumi-Tess-gradient is a 70 billion parameter language model developed by ryzen88, specifically engineered to provide an uncensored Llama 3 experience with enhanced long context capabilities. This model was constructed using the breadcrumbs_ties merge method, combining several base models to achieve its characteristics.

Merge Details

This model is a merge of three distinct Llama 3-based models:

  • Base Model: I:\Llama-3-70B-Instruct-Gradient-262k (weighted at 20%)
  • Merged Component 1: I:\Tess-2.0-Llama-3-70B-v0.2 (weighted at 20%)
  • Merged Component 2: E:\Llama-3-Lumimaid-70B-v0.1-OAS (weighted at 60%)

The merge utilized specific parameters for weight, density, and gamma for each component, with bfloat16 as the dtype.

Key Characteristics

  • Uncensored Output: Designed to provide responses without typical content restrictions.
  • Long Context Focus: A primary goal during its creation was to ensure good performance with extended context lengths.
  • Sampler Versatility: The model exhibits a "very wide optimal" sampler range, suggesting flexibility and robustness across various sampling configurations.

Good For

  • Use cases requiring an uncensored Llama 3-based model.
  • Applications benefiting from strong long-context understanding and generation.
  • Developers experimenting with different sampling settings due to its wide optimal range.