Name: davidterrell1919/Qwen2.5-Coder-3B-heretic API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: davidterrell1919

Model Overview

This model, davidterrell1919/Qwen2.5-Coder-3B-heretic, is a 3.09 billion parameter causal language model derived from the Qwen2.5-Coder series by Qwen. It features a substantial context length of 32,768 tokens and is built upon a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. Notably, this specific version has been decensored using the Heretic v1.3.0 tool, aiming to reduce content refusals.

Key Capabilities & Differentiators

Enhanced Code Performance: Based on the Qwen2.5-Coder foundation, it offers significant improvements in code generation, code reasoning, and code fixing. The original Qwen2.5-Coder models were trained on 5.5 trillion tokens, including extensive source code and text-code grounding data.
Reduced Refusals: Compared to the original Qwen/Qwen2.5-Coder-3B, this 'heretic' variant shows a marked reduction in refusals (4/100 vs. 36/100), indicating a less restrictive output behavior.
Robust Architecture: Utilizes a 36-layer transformer with Grouped Query Attention (GQA) featuring 16 Q heads and 2 KV heads.

Good For

Code-Specific Tasks: Ideal for applications requiring strong performance in code generation, debugging, and understanding.
Code Agents: Provides a solid foundation for developing code agents, while also maintaining general competencies in mathematics.
Research and Experimentation: Suitable for users seeking a less constrained version of a powerful code-focused LLM for various applications or further fine-tuning.

Overview

Model Overview

Key Capabilities & Differentiators

Good For

Full Model Card (README)