Overview
Overview
This model, ahmedheakl/cass-sm4090-3b, is a specialized fine-tuned version of the Qwen/Qwen2.5-Coder-3B-Instruct base model. It features 3.1 billion parameters and a context length of 32768 tokens, making it suitable for handling moderately long code sequences and instructions.
Key Capabilities
- Code-focused Instruction Following: Fine-tuned from a Coder model, it is optimized for understanding and generating code based on instructions.
- Specialized Training Data: The model was trained on
cuda_amd_61k_4090_p1andcuda_amd_61k_4090_p2datasets, indicating a focus on tasks related to CUDA and AMD GPU environments. - Efficient Performance: As a 3.1B parameter model, it offers a balance between performance and computational efficiency, making it practical for deployment in resource-constrained environments.
Training Details
The fine-tuning process involved a learning rate of 2e-05, a total batch size of 128, and 3 epochs. It utilized a cosine learning rate scheduler with a 0.1 warmup ratio. The training was distributed across 4 GPUs using PyTorch 2.6.0+cu124 and Transformers 4.51.3.