Qwen/Qwen2.5-Coder-32B

Warm
Public
32.8B
FP8
131072
Nov 8, 2024
License: apache-2.0
Hugging Face
Overview

Qwen2.5-Coder-32B: Advanced Code-Specific LLM

Qwen2.5-Coder-32B is the 32.5 billion parameter variant of the Qwen2.5-Coder series, a family of large language models developed by Qwen, specifically engineered for coding tasks. This model significantly improves upon its predecessor, CodeQwen1.5, by scaling training data to 5.5 trillion tokens, encompassing source code, text-code grounding, and synthetic data.

Key Capabilities & Features

  • Enhanced Code Performance: Demonstrates significant improvements in code generation, code reasoning, and code fixing, with coding abilities reportedly matching GPT-4o.
  • Comprehensive Foundation: Designed to support real-world applications like Code Agents, while also maintaining strong performance in mathematics and general competencies.
  • Long Context Support: Features an extensive context length of up to 131,072 tokens, with support for YaRN (Yet another RoPE-scaling for Long-context N-gram) for handling extremely long texts.
  • Architecture: Built on a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, and Attention QKV bias.

Use Cases & Recommendations

This pre-trained base model is ideal for developers looking to build or fine-tune models for highly specialized coding applications. It is not recommended for direct conversational use. Instead, it serves as a robust foundation for tasks requiring advanced code understanding and generation, such as:

  • Developing sophisticated code agents.
  • Implementing code completion and debugging tools.
  • Fine-tuning for specific programming languages or domains.

For optimal performance with long contexts, users can configure rope_scaling with YaRN, though this may impact performance on shorter texts if not dynamically applied.