Model Overview
This model, tinyllms/qwen2.5-7b-instruct-sft-game24-qlora-16384, is a 7.6 billion parameter instruction-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model. It has been fine-tuned using QLoRA, incorporating 4-bit NF4 quantization and LoRA adapters, which were merged prior to upload. A key feature is its 16384 maximum sequence length, allowing for processing longer contexts.
Key Training Details
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Fine-tuning Method: QLoRA (4-bit NF4 quantization, LoRA rank 64, alpha 128)
- Context Length: 16384 tokens
- Dataset: Trained on
tinyllms/game24-trajectories, with a focus on examples relevant to the Game24 problem. - Loss Calculation: Utilizes
completion_only_loss, meaning loss is computed exclusively on assistant completion tokens, masking prompt tokens. - Infrastructure: Training was conducted on NVIDIA H100 80GB GPUs using TRL 0.29 + Ray Train.
Primary Differentiator
This model is specifically fine-tuned for tasks related to the Game24 problem, leveraging a specialized dataset. Its training configuration, including the completion_only_loss and the high maximum sequence length, indicates an optimization for generating precise and relevant outputs within this problem domain. This specialization makes it distinct from general-purpose instruction-tuned models, offering enhanced performance for structured reasoning tasks like Game24.