M-Alkassem/qwen2.5-coder-3b-final-merged

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

M-Alkassem/qwen2.5-coder-3b-final-merged is a 3.1 billion parameter Qwen2.5-Coder-3B-Instruct based model developed by M-Alkassem, fine-tuned for agent-oriented coding workflows. This model, with a 32768 token context length, is optimized for constrained tool-using scenarios and excels as the reasoning core for lightweight coding agents. It was created through a two-stage adaptation pipeline, focusing on coding-focused fine-tuning followed by agent-oriented continued fine-tuning. Its primary strength lies in its ability to identify bugs, rewrite code, and manage test cycles within an agentic framework.

Loading preview...

Model Overview

M-Alkassem/qwen2.5-coder-3b-final-merged is a 3.1 billion parameter model built upon the Qwen/Qwen2.5-Coder-3B-Instruct base. It represents the culmination of a two-stage fine-tuning process, designed to enhance its capabilities for agent-oriented coding tasks. The model has a context length of 32768 tokens.

Key Capabilities

  • Agentic Workflow Optimization: Specifically fine-tuned to function as the reasoning core within lightweight coding agents, supporting constrained tool-using workflows.
  • Two-Stage Fine-Tuning: Underwent initial coding-focused fine-tuning using the bigcode/self-oss-instruct-sc2-exec-filter-50k dataset, followed by agent-oriented continued fine-tuning on the ernie-research/MEnvData-SWE-Trajectory dataset.
  • Code Remediation: Demonstrated ability to run failing tests, identify bugs, rewrite code, and re-run tests until success within an agent workflow.

Intended Use Cases

This model is particularly suited for:

  • Lightweight Coding Agents: Serving as the core intelligence for automated code generation, debugging, and testing agents.
  • Tool-Using Workflows: Applications requiring a language model to interact with external tools and execute specific actions based on reasoning.

While the base model showed stronger performance in direct, plain answer-only benchmarks, this merged model's value lies in its specialized alignment for agentic and tool-constrained environments.