aasim-m/daft-qwen2.5-coder-3b-instruct-full

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:otherArchitecture:Transformer Cold

The aasim-m/daft-qwen2.5-coder-3b-instruct-full is a 3.1 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-Coder-3B-Instruct. This model specializes in code generation and understanding, having been fine-tuned on the daft_functions_dedup_sharegpt dataset. With a 32768 token context length, it is optimized for processing and generating code-related tasks.

Loading preview...

Model Overview

The aasim-m/daft-qwen2.5-coder-3b-instruct-full is a 3.1 billion parameter instruction-tuned model, building upon the base architecture of Qwen/Qwen2.5-Coder-3B-Instruct. It has been specifically fine-tuned on the daft_functions_dedup_sharegpt dataset, indicating a strong specialization in code-related tasks, particularly function generation and understanding.

Key Capabilities

  • Code Generation: Optimized for generating code, likely focusing on Python functions given its training data.
  • Instruction Following: Designed to follow instructions for coding tasks.
  • Context Handling: Features a substantial context length of 32768 tokens, allowing it to process larger code snippets or conversational turns related to programming.

Training Details

The model was trained with a learning rate of 1e-05, a total batch size of 512 (across 4 GPUs with gradient accumulation), and utilized the AdamW_Torch_Fused optimizer. The training ran for 3 epochs with a cosine learning rate scheduler.

Intended Use Cases

This model is particularly well-suited for:

  • Code completion and generation: Assisting developers in writing code, especially functions.
  • Code explanation: Understanding and explaining existing code segments.
  • Educational tools: Providing code examples or solutions based on prompts.

Due to its specialized fine-tuning, it is expected to perform best on tasks directly related to code generation and comprehension, leveraging its Qwen2.5-Coder base and specific dataset training.