pmahdavi/Llama-3.1-8B-precise-if

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 18, 2025License:otherArchitecture:Transformer Cold

pmahdavi/Llama-3.1-8B-precise-if is an 8 billion parameter Llama-3.1-based causal language model fine-tuned by pmahdavi. This model is specifically fine-tuned on the tulu3_mixture_precise_if dataset, indicating an optimization for precise instruction following and mixed task performance. It is released in conjunction with a research paper, suggesting a focus on experimental or benchmark-driven applications.

Loading preview...

Model Overview

pmahdavi/Llama-3.1-8B-precise-if is an 8 billion parameter language model derived from the meta-llama/Llama-3.1-8B architecture. This model has undergone specific fine-tuning on the tulu3_mixture_precise_if dataset. Its release is associated with a research publication (https://arxiv.org/abs/2509.11167), suggesting its development is tied to specific research objectives, likely focusing on instruction following or mixed task performance.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: A total training batch size of 128 was achieved with a train_batch_size of 2 and gradient_accumulation_steps of 32 across 2 GPUs.
  • Optimizer: adamw_torch with default betas and epsilon.
  • Scheduler: Cosine learning rate scheduler with a 0.03 warmup ratio.
  • Epochs: Trained for 1.0 epoch.

Framework Versions

Training utilized:

  • Transformers 4.51.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.0

Potential Use Cases

Given its fine-tuning on a "precise_if" dataset, this model is likely suitable for:

  • Instruction Following: Tasks requiring accurate adherence to given instructions.
  • Research & Experimentation: As it's tied to a research paper, it may be valuable for replicating or extending research on instruction-tuned models.
  • Mixed Task Performance: Potentially robust across a variety of general language tasks due to the "mixture" aspect of its training data.