CreitinGameplays/tesy-0.3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 16, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

CreitinGameplays/tesy-0.3 is an 8 billion parameter Llama-3.1-based instruction-tuned language model developed by CreitinGameplays. This model was fine-tuned using Unsloth and Huggingface's TRL library, enabling 2x faster training. It is optimized for general instruction-following tasks, leveraging its Llama-3.1 architecture and an 8192-token context length.

Loading preview...

CreitinGameplays/tesy-0.3 Overview

CreitinGameplays/tesy-0.3 is an 8 billion parameter language model, fine-tuned by CreitinGameplays from the unsloth/Llama-3.1-8B-Instruct base model. A key characteristic of this model's development is its training efficiency, achieved by utilizing Unsloth and Huggingface's TRL library, which reportedly enabled a 2x speedup in the fine-tuning process.

Key Capabilities & Training Details

This model is designed for general instruction-following, building upon the robust Llama-3.1 architecture. The fine-tuning process involved specific configurations to optimize performance and resource usage:

  • PEFT Configuration: Utilizes LoRA (Low-Rank Adaptation) with r=16 and lora_alpha=16, targeting key attention and feed-forward projection modules (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj).
  • Training Parameters: Trained for 2 epochs with a learning rate of 2e-4, a batch size of 12, and gradient accumulation steps of 2. It supports fp16 or bf16 based on hardware capabilities.
  • Instruction-Response Training: The SFTTrainer was configured to specifically train on user-assistant response pairs, using Llama's chat template for instruction and response parts.

Good For

  • General Instruction Following: Suitable for tasks requiring the model to understand and respond to user prompts based on its Llama-3.1 foundation.
  • Efficient Deployment: As a model fine-tuned with Unsloth, it may offer advantages in terms of reduced memory footprint and faster inference compared to traditionally fine-tuned models of similar size.