ahmedheakl/cass-sm4090-3b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 10, 2025License:otherArchitecture:Transformer Warm

The ahmedheakl/cass-sm4090-3b model is a 3.1 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-Coder-3B-Instruct. It was trained on specific CUDA and AMD datasets, suggesting an optimization for code-related tasks, particularly those involving GPU architectures. This model is designed for applications requiring a compact yet capable code-focused LLM.

Loading preview...

Overview

This model, ahmedheakl/cass-sm4090-3b, is a specialized fine-tuned version of the Qwen/Qwen2.5-Coder-3B-Instruct base model. It features 3.1 billion parameters and a context length of 32768 tokens, making it suitable for handling moderately long code sequences and instructions.

Key Capabilities

  • Code-focused Instruction Following: Fine-tuned from a Coder model, it is optimized for understanding and generating code based on instructions.
  • Specialized Training Data: The model was trained on cuda_amd_61k_4090_p1 and cuda_amd_61k_4090_p2 datasets, indicating a focus on tasks related to CUDA and AMD GPU environments.
  • Efficient Performance: As a 3.1B parameter model, it offers a balance between performance and computational efficiency, making it practical for deployment in resource-constrained environments.

Training Details

The fine-tuning process involved a learning rate of 2e-05, a total batch size of 128, and 3 epochs. It utilized a cosine learning rate scheduler with a 0.1 warmup ratio. The training was distributed across 4 GPUs using PyTorch 2.6.0+cu124 and Transformers 4.51.3.

Useful Resources