nbeerbower/Flammen-Trismegistus-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Mar 9, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Flammen-Trismegistus-7B by nbeerbower is a 7 billion parameter language model created by merging nbeerbower/flammen3X and teknium/Mistral-Trismegistus-7B using the SLERP method. This model combines the characteristics of its constituent models, offering a versatile base for various natural language processing tasks. Its 4096-token context length supports moderate input sequences, making it suitable for applications requiring a balance of performance and context handling.

Loading preview...

Model Overview

Flammen-Trismegistus-7B is a 7 billion parameter language model developed by nbeerbower. It was created through a strategic merge of two pre-existing models: nbeerbower/flammen3X and teknium/Mistral-Trismegistus-7B. This merging process utilized the SLERP (Spherical Linear Interpolation) method, a technique often employed to combine the strengths of different models while maintaining coherence.

Merge Details

The model integrates the full layer ranges (0 to 32) from both nbeerbower/flammen3X and teknium/Mistral-Trismegistus-7B. The SLERP method was applied with specific parameter weighting, particularly for self_attn and mlp layers, to fine-tune the blend of characteristics from the base models. The merge was configured to use bfloat16 data type, optimizing for efficiency and performance.

Key Characteristics

  • Merged Architecture: Combines the features of flammen3X and Mistral-Trismegistus-7B.
  • SLERP Method: Utilizes a sophisticated merging technique for balanced integration.
  • 7 Billion Parameters: Offers a substantial capacity for complex language understanding and generation.
  • 4096 Token Context: Supports a reasonable context window for various applications.

Potential Use Cases

Given its merged nature, Flammen-Trismegistus-7B is expected to be suitable for a range of general-purpose NLP tasks, including text generation, summarization, and question answering, leveraging the combined strengths of its parent models.