aasim-m/daft-qwen2.5-coder-3b-instruct-full
The aasim-m/daft-qwen2.5-coder-3b-instruct-full is a 3.1 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-Coder-3B-Instruct. This model specializes in code generation and understanding, having been fine-tuned on the daft_functions_dedup_sharegpt dataset. With a 32768 token context length, it is optimized for processing and generating code-related tasks.
Loading preview...
Model Overview
The aasim-m/daft-qwen2.5-coder-3b-instruct-full is a 3.1 billion parameter instruction-tuned model, building upon the base architecture of Qwen/Qwen2.5-Coder-3B-Instruct. It has been specifically fine-tuned on the daft_functions_dedup_sharegpt dataset, indicating a strong specialization in code-related tasks, particularly function generation and understanding.
Key Capabilities
- Code Generation: Optimized for generating code, likely focusing on Python functions given its training data.
- Instruction Following: Designed to follow instructions for coding tasks.
- Context Handling: Features a substantial context length of 32768 tokens, allowing it to process larger code snippets or conversational turns related to programming.
Training Details
The model was trained with a learning rate of 1e-05, a total batch size of 512 (across 4 GPUs with gradient accumulation), and utilized the AdamW_Torch_Fused optimizer. The training ran for 3 epochs with a cosine learning rate scheduler.
Intended Use Cases
This model is particularly well-suited for:
- Code completion and generation: Assisting developers in writing code, especially functions.
- Code explanation: Understanding and explaining existing code segments.
- Educational tools: Providing code examples or solutions based on prompts.
Due to its specialized fine-tuning, it is expected to perform best on tasks directly related to code generation and comprehension, leveraging its Qwen2.5-Coder base and specific dataset training.