Overview
The arif-butt/tinyllama-peft-merged model is a 1.1 billion parameter language model based on the TinyLlama architecture. It has been fine-tuned using PEFT (Parameter-Efficient Fine-Tuning) LoRA and subsequently merged, making it a production-ready model that does not require separate PEFT adapters for inference. The model is provided in PyTorch Safetensors format, utilizing FP16 precision and supporting a context length of 2048 tokens.
Key Capabilities
- Direct Inference: No PEFT setup is needed; simply load and use the model with standard Hugging Face
AutoModelForCausalLM and AutoTokenizer classes. - Compact Size: At 1.1 billion parameters and 2.2 GB, it offers a balance between performance and resource efficiency.
- Standard Prompt Format: Utilizes a clear "Q: ...\nA:" prompt structure for instruction-following tasks.
- Configurable Generation: Supports common generation parameters like
max_new_tokens, temperature, top_p, do_sample, and repetition_penalty.
Good For
- Resource-constrained environments: Its smaller size makes it suitable for deployment where computational resources are limited.
- General text generation: Capable of answering questions and generating coherent text based on provided prompts.
- Rapid prototyping: The merged nature simplifies deployment, allowing for quick integration into applications.