allura-org/Teleut-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 24, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Teleut-7b by allura-org is a 7.6 billion parameter language model, a replication attempt of the Tulu 3 model built upon the Qwen 2.5 base architecture. It features a 131072 token context length and demonstrates strong performance across various benchmarks, particularly in reasoning and instruction-following tasks. This model is optimized for general-purpose conversational AI and complex instruction adherence, making it suitable for applications requiring robust understanding and generation.

Loading preview...

Teleut-7b: Tulu 3 Replication on Qwen 2.5 Base

Teleut-7b is a 7.6 billion parameter language model developed by allura-org, representing a replication effort of the Tulu 3 model, built on the Qwen 2.5 base model series. It leverages a substantial 131072 token context length, enabling it to process and generate longer, more coherent texts.

Key Capabilities & Performance

This model demonstrates competitive performance across several benchmarks, often outperforming or closely matching other models in its class. Notable results include:

  • BBH (3-shot, CoT): Achieves 64.4%, indicating strong multi-hop reasoning abilities.
  • MMLU (0-shot, CoT): Scores 73.2%, showcasing its general knowledge and understanding.
  • IFEval (prompt loose): Reaches 66.3%, highlighting its instruction-following capabilities.

While it shows robust performance in reasoning and instruction adherence, it is important to note that for some specific tasks like GSM8K and MMLU, the original Qwen 2.5 7B Instruct model reports higher scores. The training procedure involved a single epoch with a learning rate of 3.5e-06, utilizing an 8xH100 polycule for training and testing.

Good For

  • General-purpose conversational AI: Its strong instruction-following and reasoning make it suitable for chatbots and virtual assistants.
  • Complex instruction adherence: Excels in scenarios where precise understanding and execution of user prompts are critical.
  • Research and development: Provides a strong base for further fine-tuning or experimentation, particularly for those interested in Tulu 3's architecture on a Qwen 2.5 foundation.