Teleut-7b: Tulu 3 Replication on Qwen 2.5 Base
Teleut-7b is a 7.6 billion parameter language model developed by allura-org, representing a replication effort of the Tulu 3 model, built on the Qwen 2.5 base model series. It leverages a substantial 131072 token context length, enabling it to process and generate longer, more coherent texts.
Key Capabilities & Performance
This model demonstrates competitive performance across several benchmarks, often outperforming or closely matching other models in its class. Notable results include:
- BBH (3-shot, CoT): Achieves 64.4%, indicating strong multi-hop reasoning abilities.
- MMLU (0-shot, CoT): Scores 73.2%, showcasing its general knowledge and understanding.
- IFEval (prompt loose): Reaches 66.3%, highlighting its instruction-following capabilities.
While it shows robust performance in reasoning and instruction adherence, it is important to note that for some specific tasks like GSM8K and MMLU, the original Qwen 2.5 7B Instruct model reports higher scores. The training procedure involved a single epoch with a learning rate of 3.5e-06, utilizing an 8xH100 polycule for training and testing.
Good For
- General-purpose conversational AI: Its strong instruction-following and reasoning make it suitable for chatbots and virtual assistants.
- Complex instruction adherence: Excels in scenarios where precise understanding and execution of user prompts are critical.
- Research and development: Provides a strong base for further fine-tuning or experimentation, particularly for those interested in Tulu 3's architecture on a Qwen 2.5 foundation.