unsloth/Qwen2.5-Coder-0.5B-Instruct

Warm
Public
0.5B
BF16
32768
License: apache-2.0
Hugging Face
Overview

Qwen2.5-Coder-0.5B-Instruct Overview

This model is part of the Qwen2.5-Coder series, a collection of code-specific large language models developed by Qwen. The unsloth/Qwen2.5-Coder-0.5B-Instruct is the 0.5 billion parameter variant, designed for efficient code-related tasks.

Key Capabilities & Features

  • Enhanced Code Performance: Significant improvements in code generation, reasoning, and fixing, leveraging the robust Qwen2.5 base.
  • Extensive Training: Trained on 5.5 trillion tokens, including a vast amount of source code, text-code grounding, and synthetic data.
  • Architecture: Utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
  • Context Length: Supports a full 32,768 token context window, beneficial for handling larger codebases.
  • Parameter Count: Features 0.49 billion parameters (0.36 billion non-embedding parameters) across 24 layers.
  • Foundation for Code Agents: Aims to provide a comprehensive foundation for advanced real-world applications like Code Agents, while also maintaining strengths in mathematics and general competencies.

When to Use This Model

  • Code-centric Tasks: Ideal for applications requiring strong code generation, debugging, and reasoning capabilities.
  • Resource-Constrained Environments: Its smaller 0.5B parameter size makes it suitable for scenarios where computational resources are limited, offering a balance between performance and efficiency.
  • Further Fine-tuning: Recommended as a base model for post-training activities such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining for specific domain adaptation.