unsloth/Qwen2.5-Coder-32B
Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Nov 12, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

unsloth/Qwen2.5-Coder-32B is a 0.49 billion parameter causal language model from the Qwen2.5-Coder series, developed by Qwen. This model is specifically designed for code generation, code reasoning, and code fixing, building upon the Qwen2.5 architecture. It features a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings, supporting a full 32,768 token context length. The Qwen2.5-Coder series is trained on 5.5 trillion tokens, including extensive source code and text-code grounding data, aiming for state-of-the-art performance in open-source code LLMs.

Loading preview...

unsloth/Qwen2.5-Coder-32B: Code-Specific Language Model

unsloth/Qwen2.5-Coder-32B is part of the latest Qwen2.5-Coder series, a family of code-specific large language models developed by Qwen. This particular model is a 0.49 billion parameter variant, designed to excel in various coding tasks.

Key Capabilities and Features

  • Enhanced Code Performance: Significant improvements in code generation, code reasoning, and code fixing compared to its predecessor, CodeQwen1.5.
  • Extensive Training Data: Trained on a massive 5.5 trillion tokens, including a substantial amount of source code, text-code grounding data, and synthetic data.
  • State-of-the-Art Coding: The Qwen2.5-Coder-32B model is positioned as a leading open-source code LLM, with coding abilities comparable to GPT-4o.
  • Real-World Applications: Provides a robust foundation for applications like Code Agents, while also maintaining strong performance in mathematics and general competencies.
  • Architectural Details: Utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
  • Context Length: Supports a full context length of 32,768 tokens.

Intended Use

This base model is primarily intended for further post-training, such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining. It is also suitable for fill-in-the-middle tasks. The developers do not recommend using base language models like this for direct conversational applications without additional fine-tuning.