AgPerry/Qwen2.5-Coder-14B-Instruct-num11_v1-v2-v3-pairs-v3-triples

TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Apr 15, 2026License:otherArchitecture:Transformer Cold

AgPerry/Qwen2.5-Coder-14B-Instruct-num11 is a 14.8 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-Coder-14B-Instruct. This model specializes in code generation and understanding, having been trained on multiple datasets focused on fill-in-the-middle (FIM) tasks. It is designed for developers requiring a robust model for programming-related applications, offering a 32768 token context length.

Loading preview...

Overview

AgPerry/Qwen2.5-Coder-14B-Instruct-num11 is a 14.8 billion parameter instruction-tuned language model, building upon the base of Qwen/Qwen2.5-Coder-14B-Instruct. This model has undergone further fine-tuning across several specialized datasets, including fim_midtrain_v1, fim_midtrain_v2, fim_midtrain_v3_multi_pairs, fim_midtrain_v3_multi_pairs_0317, fim_midtrain_v3_multi_triples, and fim_midtrain_v3_multi_triples_0317. These datasets are typically used for fill-in-the-middle (FIM) tasks, indicating a strong focus on code completion and generation capabilities.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-Coder-14B-Instruct.
  • Parameter Count: 14.8 billion parameters.
  • Context Length: Supports a context window of 32768 tokens.
  • Specialization: Enhanced for code-related tasks, particularly those involving fill-in-the-middle scenarios, through targeted fine-tuning.

Training Details

The model was trained with a learning rate of 1e-05, a total batch size of 128 (achieved with a train_batch_size of 1 and gradient_accumulation_steps of 16 across 8 GPUs), and utilized the AdamW optimizer with a cosine learning rate scheduler. Training was conducted for 1 epoch.

Intended Use Cases

This model is primarily intended for applications requiring advanced code generation, completion, and understanding. Its fine-tuning on FIM datasets suggests strong performance in scenarios where code needs to be intelligently filled in or completed based on surrounding context.