Model Overview
The tinyllms/qwen2.5-7b-instruct-sft-game24-qlora is a 7.6 billion parameter instruction-tuned model, fine-tuned from the base Qwen/Qwen2.5-7B-Instruct architecture. Its primary distinction lies in its specialized training on the tinyllms/game24-trajectories dataset, making it highly proficient in solving the Game24 mathematical puzzle.
Key Training Details
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Fine-tuning Method: QLoRA (4-bit NF4 quantization with double quantization)
- Targeted Task: Game24 puzzle solving
- Dataset:
tinyllms/game24-trajectories - Loss Calculation:
completion_only_loss, focusing on assistant completion tokens. - Hardware: NVIDIA H100 80GB GPUs
- Framework: TRL 0.29 + Ray Train
When to Use This Model
- Specialized Game24 Solver: This model is specifically optimized for generating solutions to the Game24 puzzle. Its fine-tuning on a dedicated dataset for this task means it should perform well in this niche.
- Research on Task-Specific Fine-tuning: Ideal for researchers exploring the impact of highly specialized instruction-tuning on a base model for a particular reasoning challenge.
- Efficient Deployment: Leveraging QLoRA, the model benefits from 4-bit quantization, which can lead to more efficient inference compared to full-precision models of similar size, especially for its targeted use case.