donoway/ARC-Challenge_Llama-3.2-1B-lrye0hm9

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Aug 19, 2025License:llama3.2Architecture:Transformer Warm

The donoway/ARC-Challenge_Llama-3.2-1B-lrye0hm9 is a 1 billion parameter language model fine-tuned from Meta Llama-3.2-1B. This model was trained for 100 epochs with a context length of 32768 tokens, achieving an overall accuracy of 42.81% on its evaluation set. It is specifically optimized for tasks related to the ARC Challenge, as indicated by its naming and evaluation metrics.

Loading preview...

Overview

This model, donoway/ARC-Challenge_Llama-3.2-1B-lrye0hm9, is a 1 billion parameter language model derived from the meta-llama/Llama-3.2-1B architecture. It was fine-tuned over 100 epochs with a learning rate of 2e-05 and a batch size of 64, utilizing an AdamW optimizer. The model's evaluation on an unspecified dataset shows an overall accuracy of 42.81% (with a generation accuracy of 41.47%) across 299 predictions, indicating its performance on the specific tasks it was trained for.

Key Capabilities

  • Fine-tuned for specific tasks: The model has been fine-tuned, suggesting specialization beyond its base Llama-3.2-1B capabilities.
  • Evaluation Metrics: Detailed accuracy metrics are provided for various label sets (e.g., Accuracy 32: 0.1875, Accuracy 33: 0.3562, Accuracy 34: 0.5513, Accuracy 35: 0.5542, Accuracy 36: 1.0).
  • Training Stability: The training loss decreased significantly in early epochs and remained stable at 0.0 for the majority of the training, while validation loss increased, suggesting potential overfitting to the training data.

Good for

  • Research and experimentation: Given the specific fine-tuning and detailed evaluation metrics, this model could be useful for researchers exploring the performance of Llama-3.2-1B on particular datasets or tasks, especially those related to the ARC Challenge. However, the README explicitly states that more information is needed regarding its intended uses and limitations, as well as the training and evaluation data.