typhoon-ai/typhoon2-qwen2.5-7b-instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Dec 16, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The scb10x/typhoon2-qwen2.5-7b-instruct is a 7.6 billion parameter instruction-tuned decoder-only large language model developed by scb10x, based on the Qwen2.5 architecture. It is specifically optimized for Thai language performance across instruction-following, function calling, and domain-specific tasks like math and coding, while also supporting English. This model features a 128k context length, with support for YaRN scaling to handle even longer texts.

Loading preview...

Overview

Typhoon2-Qwen2.5-7B-Instruct is a 7.6 billion parameter instruction-tuned large language model developed by scb10x, built upon the Qwen2.5 architecture. It is primarily designed for Thai language processing but also supports English, making it a strong choice for bilingual applications. The model demonstrates enhanced performance in Thai across various benchmarks, including instruction-following, function calling, and domain-specific tasks.

Key Capabilities

  • Bilingual Proficiency: Excels in Thai language tasks, outperforming the base Qwen2.5 7B Instruct model in Thai IFEval, MT-Bench TH, Thai Code-Switching, FC-TH, GSM8K-TH, and MATH-TH.
  • Function Calling: Shows strong capabilities in function calling for both Thai and English, with 74.24% and 75.44% accuracy respectively.
  • Long Context Handling: Supports a 128k context length and can be further extended using YaRN (Yet another RoPE-scaling method) for processing extremely long texts, though static YaRN in vLLM may impact performance on shorter inputs.
  • Domain-Specific Performance: Achieves notable results in Thai math (GSM8K-TH 79.07%, MATH-TH 55.42%) and coding (HumanEval-TH 73.2%, MBPP-TH 78.3%).

Intended Uses

This model is an instructional model suitable for a wide range of applications requiring strong Thai language understanding and generation, including chatbots, content creation, and code assistance. Developers should be aware that it is still under development and may produce occasional inaccuracies or biases, necessitating risk assessment for specific use cases. For more technical details, refer to the arXiv paper.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p