Qwen/Qwen2.5-Coder-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Sep 16, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

Qwen/Qwen2.5-Coder-7B is a 7.61 billion parameter causal language model developed by Qwen, part of the Qwen2.5-Coder series. This pre-trained model is specifically optimized for code generation, code reasoning, and code fixing, building upon the Qwen2.5 architecture. It features a transformer architecture with a full context length of 131,072 tokens, making it suitable for complex coding tasks and long-context applications.

Loading preview...

Qwen2.5-Coder-7B Overview

Qwen2.5-Coder-7B is a 7.61 billion parameter causal language model from the Qwen2.5-Coder series, developed by Qwen. This model is a pre-trained variant, significantly improving upon its predecessor, CodeQwen1.5, particularly in coding capabilities. It is designed to serve as a robust foundation for real-world applications like Code Agents, while also maintaining strong performance in mathematics and general competencies.

Key Capabilities

  • Enhanced Code Performance: Demonstrates significant improvements in code generation, code reasoning, and code fixing.
  • Extensive Training: Trained on 5.5 trillion tokens, including a substantial amount of source code, text-code grounding, and synthetic data.
  • Long-Context Support: Features a full context length of 131,072 tokens, utilizing techniques like YaRN for optimal performance on lengthy texts.
  • Architectural Foundation: Built on a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, and Attention QKV bias.

When to Use This Model

Qwen2.5-Coder-7B is ideal for developers and researchers focused on:

  • Code-centric applications: Excels in tasks requiring advanced code generation, debugging, and understanding.
  • Foundation for fine-tuning: Recommended as a base model for further post-training, such as Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), rather than direct conversational use.
  • Long-context coding tasks: Its 131K context window makes it suitable for handling large codebases or complex multi-file projects.

For more detailed information, including evaluation results and deployment guidelines, refer to the official blog and GitHub repository.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p