aasim-m/daft-qwen2.5-coder-3b-instruct-full-loss-0.02
The aasim-m/daft-qwen2.5-coder-3b-instruct-full-loss-0.02 model is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-Coder-3B-Instruct. It was specifically trained on the daft_functions_dedup_sharegpt dataset, indicating an optimization for code-related tasks, particularly function generation or understanding. With a context length of 32768 tokens, this model is designed for applications requiring robust code instruction following and generation.
Loading preview...
Model Overview
This model, aasim-m/daft-qwen2.5-coder-3b-instruct-full-loss-0.02, is a specialized instruction-tuned variant of the Qwen2.5-Coder-3B-Instruct architecture. It features 3.1 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer code snippets or complex programming instructions.
Key Capabilities
- Code Instruction Following: Fine-tuned on the
daft_functions_dedup_sharegptdataset, suggesting a strong capability in understanding and generating code based on instructions. - Code Generation: Optimized for tasks related to programming functions and code structures.
- Extended Context: The 32K token context window allows for handling more extensive codebases or detailed problem descriptions.
Training Details
The model was trained with a learning rate of 0.0001, using an AdamW optimizer and a cosine learning rate scheduler. Training involved a total batch size of 64 over 3 epochs, leveraging multi-GPU distribution. This specific fine-tuning process aims to enhance its performance on code-centric tasks, differentiating it from general-purpose instruction models.
Intended Use Cases
This model is particularly well-suited for developers and researchers focused on:
- Automated Code Generation: Creating functions or code blocks from natural language prompts.
- Code Completion and Refactoring: Assisting with programming tasks within an IDE or development environment.
- Educational Tools: Generating examples or explanations for programming concepts.
While specific performance metrics are not detailed, its specialized training on a code-focused dataset indicates a strong aptitude for programming-related applications.