davidterrell1919/Qwen2.5-Coder-3B-heretic
davidterrell1919/Qwen2.5-Coder-3B-heretic is a 3.09 billion parameter causal language model, based on the Qwen2.5-Coder architecture by Qwen, with a 32,768 token context length. This version is a decensored variant of the original Qwen/Qwen2.5-Coder-3B, created using the Heretic v1.3.0 tool. It is specifically optimized for code generation, code reasoning, and code fixing tasks, demonstrating reduced refusals compared to its original counterpart.
Loading preview...
Model Overview
This model, davidterrell1919/Qwen2.5-Coder-3B-heretic, is a 3.09 billion parameter causal language model derived from the Qwen2.5-Coder series by Qwen. It features a substantial context length of 32,768 tokens and is built upon a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. Notably, this specific version has been decensored using the Heretic v1.3.0 tool, aiming to reduce content refusals.
Key Capabilities & Differentiators
- Enhanced Code Performance: Based on the Qwen2.5-Coder foundation, it offers significant improvements in code generation, code reasoning, and code fixing. The original Qwen2.5-Coder models were trained on 5.5 trillion tokens, including extensive source code and text-code grounding data.
- Reduced Refusals: Compared to the original
Qwen/Qwen2.5-Coder-3B, this 'heretic' variant shows a marked reduction in refusals (4/100 vs. 36/100), indicating a less restrictive output behavior. - Robust Architecture: Utilizes a 36-layer transformer with Grouped Query Attention (GQA) featuring 16 Q heads and 2 KV heads.
Good For
- Code-Specific Tasks: Ideal for applications requiring strong performance in code generation, debugging, and understanding.
- Code Agents: Provides a solid foundation for developing code agents, while also maintaining general competencies in mathematics.
- Research and Experimentation: Suitable for users seeking a less constrained version of a powerful code-focused LLM for various applications or further fine-tuning.