Qwen/Qwen2.5-Coder-3B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Nov 8, 2024License:otherArchitecture:Transformer0.0K Warm

Qwen2.5-Coder-3B is a 3.09 billion parameter causal language model developed by Qwen, part of the Qwen2.5-Coder series. This model is specifically designed and extensively pretrained for code generation, code reasoning, and code fixing, building upon the Qwen2.5 foundation. It features a 32,768-token context length and is optimized for real-world coding applications and maintaining general competencies.

Loading preview...

Qwen2.5-Coder-3B Overview

Qwen2.5-Coder-3B is a 3.09 billion parameter model from the Qwen2.5-Coder family, a series of code-specific large language models developed by Qwen. This model builds upon the strong Qwen2.5 foundation, with significant improvements in coding capabilities through extensive pretraining on 5.5 trillion tokens, including source code, text-code grounding, and synthetic data.

Key Capabilities

  • Enhanced Code Performance: Demonstrates significant advancements in code generation, code reasoning, and code fixing.
  • Comprehensive Foundation: Designed to support real-world applications like Code Agents, while also maintaining strong performance in mathematics and general language understanding.
  • Technical Specifications: Features a 3.09 billion parameter transformer architecture with a substantial 32,768-token context length, RoPE, SwiGLU, and RMSNorm.

Intended Use

This base model is primarily intended for further post-training, such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining, to adapt it for specific conversational or fill-in-the-middle tasks. It is not recommended for direct conversational use without further fine-tuning.