Qwen/Qwen2.5-Coder-1.5B
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 18, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Warm

Qwen/Qwen2.5-Coder-1.5B is a 1.54 billion parameter causal language model from the Qwen2.5-Coder series, developed by Qwen. This model is specifically designed and significantly improved for code generation, code reasoning, and code fixing, building upon the Qwen2.5 architecture. It features a 32,768-token context length and is optimized for real-world coding applications and maintaining strong mathematical and general competencies.

Loading preview...

Qwen2.5-Coder-1.5B Overview

Qwen2.5-Coder-1.5B is part of the latest Qwen2.5-Coder series, a family of code-specific large language models developed by Qwen. This 1.54 billion parameter model is a pre-trained causal language model built on a transformer architecture, featuring RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. It boasts a substantial context length of 32,768 tokens.

Key Capabilities & Improvements

  • Enhanced Code Performance: Significant improvements in code generation, code reasoning, and code fixing compared to its predecessor, CodeQwen1.5.
  • Extensive Training: Trained on 5.5 trillion tokens, including a large proportion of source code, text-code grounding, and synthetic data.
  • Foundation for Code Agents: Provides a robust foundation for real-world applications like Code Agents, balancing strong coding abilities with general competencies and mathematics.
  • Architectural Features: Utilizes a transformer architecture with 28 layers, 12 attention heads for Q, and 2 for KV (GQA).

Recommended Use Cases

  • Code-centric Tasks: Ideal for tasks requiring advanced code generation, debugging, and understanding.
  • Further Fine-tuning: Recommended as a base model for post-training techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to adapt it for specific conversational or fill-in-the-middle tasks.

For detailed evaluation results and further information, refer to the official Qwen2.5-Coder blog and GitHub repository.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p