mustafademirr87/troia-coder
The mustafademirr87/troia-coder is an instruction-tuned 7.61 billion parameter causal language model from the Qwen2.5-Coder family, developed by Alibaba Cloud. This model is specifically optimized for code generation, code reasoning, and code fixing, building upon the strong Qwen2.5 foundation with 5.5 trillion training tokens including extensive source code. It features a 131,072-token context length and maintains strong performance in mathematics and general competencies, making it suitable for advanced code-centric applications and Code Agents.
Loading preview...
Qwen2.5-Coder-7B-Instruct Overview
This model, mustafademirr87/troia-coder, is an instruction-tuned 7.61 billion parameter variant from the Qwen2.5-Coder series, developed by Alibaba Cloud. It represents a significant advancement over its predecessor, CodeQwen1.5, with a focus on enhanced coding capabilities.
Key Capabilities & Features
- Code-Specific Optimization: Significantly improved performance in code generation, code reasoning, and code fixing.
- Extensive Training Data: Trained on 5.5 trillion tokens, including a vast amount of source code, text-code grounding, and synthetic data.
- Long Context Support: Features a full 131,072-token context length, with support for even longer texts via YaRN scaling (up to 128K tokens).
- General Competency: Maintains strong performance in mathematics and general language understanding, making it versatile.
- Architecture: Utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
Use Cases & Differentiators
This model is particularly well-suited for real-world applications requiring robust coding abilities, such as Code Agents. Its comprehensive training and specialized fine-tuning for code tasks differentiate it from general-purpose LLMs, positioning it as a powerful tool for developers. The 32B version of Qwen2.5-Coder is noted to match the coding abilities of GPT-4o, indicating the family's strong performance in this domain. The 7B instruction-tuned model offers a balance of capability and efficiency for various coding challenges.