ahmedheakl/cass-sm4090-3b

Warm
Public
3.1B
BF16
32768
May 10, 2025
License: other
Hugging Face
Overview

Overview

This model, ahmedheakl/cass-sm4090-3b, is a specialized fine-tuned version of the Qwen/Qwen2.5-Coder-3B-Instruct base model. It features 3.1 billion parameters and a context length of 32768 tokens, making it suitable for handling moderately long code sequences and instructions.

Key Capabilities

  • Code-focused Instruction Following: Fine-tuned from a Coder model, it is optimized for understanding and generating code based on instructions.
  • Specialized Training Data: The model was trained on cuda_amd_61k_4090_p1 and cuda_amd_61k_4090_p2 datasets, indicating a focus on tasks related to CUDA and AMD GPU environments.
  • Efficient Performance: As a 3.1B parameter model, it offers a balance between performance and computational efficiency, making it practical for deployment in resource-constrained environments.

Training Details

The fine-tuning process involved a learning rate of 2e-05, a total batch size of 128, and 3 epochs. It utilized a cosine learning rate scheduler with a 0.1 warmup ratio. The training was distributed across 4 GPUs using PyTorch 2.6.0+cu124 and Transformers 4.51.3.

Useful Resources