Qwen/Qwen2.5-3B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Sep 15, 2024License:otherArchitecture:Transformer0.2K Warm

Qwen2.5-3B is a 3.09 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, and RMSNorm. This base model offers a 32,768 token context length and significantly improved capabilities in coding, mathematics, instruction following, and generating structured outputs like JSON. It also provides robust multilingual support for over 29 languages, making it suitable for further fine-tuning for specialized applications.

Loading preview...

Qwen2.5-3B: An Enhanced Base Language Model

Qwen2.5-3B is a 3.09 billion parameter base causal language model from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture, incorporating improvements in several key areas. This model is designed for pretraining and is not recommended for direct conversational use without further fine-tuning (e.g., SFT, RLHF).

Key Capabilities & Features

  • Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics due to specialized expert models.
  • Instruction Following: Demonstrates substantial improvements in adhering to instructions and generating structured outputs, including JSON.
  • Long-Context Support: Capable of handling contexts up to 32,768 tokens, with the broader Qwen2.5 series supporting up to 128K tokens.
  • Multilingual Support: Offers robust support for over 29 languages, including Chinese, English, French, Spanish, German, and Japanese.
  • Structured Data Understanding: Better at understanding structured data like tables and generating structured outputs.
  • Architecture: Utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.

When to Use This Model

Qwen2.5-3B is ideal for developers looking for a strong base model to:

  • Fine-tune for specific tasks: Apply Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining.
  • Develop specialized applications: Leverage its enhanced coding, mathematics, and structured output generation for domain-specific solutions.
  • Build multilingual applications: Utilize its broad language support for global deployments.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p