lllqaq/Qwen2.5-Coder-14B-Instruct-num11_v1-v2-v3-pairs-v3-triples-rope1mfix
The lllqaq/Qwen2.5-Coder-14B-Instruct-num11 is a 14.8 billion parameter instruction-tuned model based on Qwen/Qwen2.5-Coder-14B-Instruct, developed by lllqaq. It has been fine-tuned on multiple FIM (Fill-in-the-Middle) datasets, including fim_midtrain_v1, v2, v3_multi_pairs, and v3_multi_triples, and features a fixed RoPE configuration for improved compatibility with current Transformers/vLLM loaders. This model is specialized for code generation and completion tasks, leveraging its extensive fine-tuning on code-centric datasets.
Loading preview...
Model Overview
This model, lllqaq/Qwen2.5-Coder-14B-Instruct-num11, is a 14.8 billion parameter instruction-tuned variant of the Qwen/Qwen2.5-Coder-14B-Instruct base model. It has been specifically fine-tuned on a series of FIM (Fill-in-the-Middle) datasets, including fim_midtrain_v1, fim_midtrain_v2, fim_midtrain_v3_multi_pairs, fim_midtrain_v3_multi_pairs_0317, fim_midtrain_v3_multi_triples, and fim_midtrain_v3_multi_triples_0317.
Key Features
- Code-centric Fine-tuning: Enhanced for code generation and completion through specialized FIM datasets.
- RoPE Configuration Fix: Includes a top-level
"rope_theta": 1000000.0in itsconfig.jsonto ensure proper RoPE base handling by current Transformers and vLLM loaders, preventing fallback to default values.
Training Details
The model was trained with a learning rate of 1e-05, a total batch size of 128 (achieved with 1 gradient accumulation step across 8 GPUs), and a cosine learning rate scheduler with 0.1 warmup steps over 1 epoch. It utilizes ADAMW_TORCH optimizer. The training leveraged Transformers 5.0.0, Pytorch 2.6.0+cu124, Datasets 4.0.0, and Tokenizers 0.22.2.
Intended Use Cases
This model is primarily intended for tasks requiring robust code understanding, generation, and completion, particularly in scenarios where Fill-in-the-Middle capabilities are beneficial.