arif-butt/tinyllama-peft-merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Mar 25, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The arif-butt/tinyllama-peft-merged model is a 1.1 billion parameter TinyLlama variant, fine-tuned using PEFT LoRA and fully merged for direct inference. This PyTorch Safetensors model operates at FP16 precision with a 2048-token context length. It is designed for general-purpose text generation, offering a production-ready solution without requiring PEFT adapters during deployment.

Loading preview...

Overview

The arif-butt/tinyllama-peft-merged model is a 1.1 billion parameter language model based on the TinyLlama architecture. It has been fine-tuned using PEFT (Parameter-Efficient Fine-Tuning) LoRA and subsequently merged, making it a production-ready model that does not require separate PEFT adapters for inference. The model is provided in PyTorch Safetensors format, utilizing FP16 precision and supporting a context length of 2048 tokens.

Key Capabilities

  • Direct Inference: No PEFT setup is needed; simply load and use the model with standard Hugging Face AutoModelForCausalLM and AutoTokenizer classes.
  • Compact Size: At 1.1 billion parameters and 2.2 GB, it offers a balance between performance and resource efficiency.
  • Standard Prompt Format: Utilizes a clear "Q: ...\nA:" prompt structure for instruction-following tasks.
  • Configurable Generation: Supports common generation parameters like max_new_tokens, temperature, top_p, do_sample, and repetition_penalty.

Good For

  • Resource-constrained environments: Its smaller size makes it suitable for deployment where computational resources are limited.
  • General text generation: Capable of answering questions and generating coherent text based on provided prompts.
  • Rapid prototyping: The merged nature simplifies deployment, allowing for quick integration into applications.