f0rc3ps/Qwen2.5-3B

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 6, 2026License:qwen-researchArchitecture:Transformer Cold

The f0rc3ps/Qwen2.5-3B model is a 3.09 billion parameter causal language model developed by Qwen, part of the Qwen2.5 series. This base model features a 32,768-token context length and is designed for pretraining, offering significant improvements in knowledge, coding, and mathematics compared to its predecessors. It excels in instruction following, long text generation, structured data understanding, and multilingual support across 29 languages, making it suitable for further fine-tuning for specialized applications.

Loading preview...

Qwen2.5-3B: An Enhanced Base Language Model

Qwen2.5-3B is a 3.09 billion parameter base causal language model from the Qwen2.5 series, developed by Qwen. This iteration builds upon Qwen2 with substantial improvements across several key areas, making it a robust foundation for various NLP tasks.

Key Capabilities & Enhancements

  • Expanded Knowledge & Specialized Skills: Significantly enhanced knowledge base, with greatly improved capabilities in coding and mathematics due to specialized expert models.
  • Instruction Following & Structured Output: Demonstrates significant improvements in following instructions, generating long texts (over 8K tokens), understanding structured data (like tables), and producing structured outputs, especially JSON.
  • Robustness: More resilient to diverse system prompts, which enhances role-play implementation and condition-setting for chatbots.
  • Long Context Support: Features a full 32,768-token context length and can generate up to 8K tokens.
  • Multilingual Support: Offers comprehensive support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

Architecture & Training

This model is a pre-trained causal language model utilizing a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. It consists of 36 layers and 16 attention heads (with 2 for KV in GQA configuration).

When to Use This Model

As a base language model, Qwen2.5-3B is not recommended for direct conversational use. Instead, it is ideal for developers looking to apply further post-training techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to tailor it for specific applications requiring its enhanced capabilities in coding, mathematics, structured data handling, or multilingual processing.