migtissera/Tess-M-v1.3

TEXT GENERATIONConcurrency Cost:2Model Size:34BQuant:FP8Ctx Length:32kPublished:Nov 24, 2023License:yi-34bArchitecture:Transformer0.0K Cold

Tess-M-v1.3 by migtissera is a 34 billion parameter general-purpose large language model, built upon the Yi-34B-200K base architecture. This model has been refined to address issues present in earlier versions, offering improved stability and performance. It is designed for general language tasks and has been tested for long context length capabilities up to 32768 tokens, showing minor repetition.

Loading preview...

Overview

migtissera/Tess-M-v1.3, named "Tesoro" (Treasure in Italian), is a 34 billion parameter general-purpose large language model. It is built on the Yi-34B-200K base model and represents a stable release, with previous issues from versions 1.0, 1.1, and 1.2 rectified through dedicated R&D.

Key Capabilities

  • General Purpose LLM: Designed for a wide range of language understanding and generation tasks.
  • Long Context Handling: Tested for very long context lengths, demonstrating minor repetition even at extended contexts. Users are advised to test for specific use cases and consider limiting context length if needed.
  • Stable Release: This version incorporates fixes and improvements from prior iterations, aiming for enhanced reliability.

Prompt Format

The model utilizes a specific prompt format:

SYSTEM: <ANY SYSTEM CONTEXT>
USER: 
ASSISTANT:

Learnings and Insights

The developer has documented learnings from the training process, transitioning from Tess-v1.0 to Tess-v1.3, providing transparency into its development journey. Further details on testing and development insights are available via the developer's Substack articles.