minpeter/Alpaca-Llama-3.2-1B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kLicense:llama3.2Architecture:Transformer Warm

minpeter/Alpaca-Llama-3.2-1B-Instruct is a 1 billion parameter instruction-tuned language model based on Meta's Llama-3.2-1B architecture. Fine-tuned by minpeter on the tatsu-lab/alpaca dataset, this model is designed for general instruction-following tasks. It leverages a 32768 token context length and is optimized for efficient deployment in applications requiring a compact yet capable LLM.

Loading preview...

Overview

minpeter/Alpaca-Llama-3.2-1B-Instruct is a 1 billion parameter instruction-tuned model, built upon the meta-llama/Llama-3.2-1B base architecture. This model has been fine-tuned using the tatsu-lab/alpaca dataset, making it suitable for a variety of instruction-following applications. The training process utilized Axolotl, a popular framework for fine-tuning large language models.

Key Capabilities

  • Instruction Following: Fine-tuned on the Alpaca dataset, it is designed to understand and execute user instructions effectively.
  • Compact Size: With 1 billion parameters, it offers a balance between performance and computational efficiency, making it suitable for resource-constrained environments or applications where a smaller footprint is desired.
  • Llama 3.2 Base: Benefits from the foundational capabilities and architecture of the Llama 3.2 series.
  • Context Length: Supports a sequence length of 8192 tokens, allowing for processing moderately long inputs.

Training Details

The model was trained with a learning rate of 2e-05 over 1 epoch, using a batch size of 1 and gradient accumulation steps of 8. The optimizer used was paged_adamw_8bit with a cosine learning rate scheduler. Validation loss achieved was 1.3881. The training leveraged flash_attention for improved efficiency.

Good For

  • General Instruction-Following: Ideal for tasks requiring the model to respond to prompts and instructions.
  • Prototyping: Its smaller size makes it a good candidate for rapid experimentation and development.
  • Edge Deployment: Potentially suitable for applications where a larger model is not feasible due to hardware limitations.