pmahdavi/Llama-3.1-8B-coding

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 18, 2025License:otherArchitecture:Transformer Cold

pmahdavi/Llama-3.1-8B-coding is a fine-tuned Llama 3.1 8B parameter model developed by pmahdavi, specifically optimized for coding tasks. This model was fine-tuned on the tulu3_mixture_coding dataset, indicating its specialization in code generation and understanding. It is designed for developers requiring a capable language model for programming-related applications.

Loading preview...

Model Overview

pmahdavi/Llama-3.1-8B-coding is a specialized language model derived from the meta-llama/Llama-3.1-8B architecture. This model has undergone fine-tuning on the tulu3_mixture_coding dataset, which suggests its primary focus and optimization are for various coding-related tasks. Its release is associated with the academic paper https://arxiv.org/abs/2509.11167.

Training Details

The model was trained with a learning rate of 5e-06 over 1.0 epoch, utilizing a total batch size of 128 across 2 GPUs with a gradient accumulation of 32 steps. The training employed an AdamW optimizer and a cosine learning rate scheduler with a warmup ratio of 0.03. The development environment included Transformers 4.51.1, Pytorch 2.6.0+cu124, Datasets 3.4.1, and Tokenizers 0.21.0.

Intended Use Cases

Given its fine-tuning on a coding-specific dataset, this model is well-suited for applications requiring:

  • Code Generation: Creating new code snippets or functions based on natural language prompts.
  • Code Completion: Assisting developers by suggesting code as they type.
  • Code Understanding: Analyzing and explaining existing code.
  • Debugging Assistance: Identifying potential issues or suggesting fixes in code.

This model is a strong candidate for developers and researchers focused on enhancing programming workflows with AI.