Jeesup/tofu_Llama-3.2-1B-Instruct_forget10_RMU_qat-int4

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:May 21, 2026License:bsd-3-clauseArchitecture:Transformer Open Weights Warm

The Jeesup/tofu_Llama-3.2-1B-Instruct_forget10_RMU_qat-int4 model is a 1 billion parameter instruction-tuned variant of the Llama-3.2 architecture, fine-tuned from open-unlearning/tofu_Llama-3.2-1B-Instruct_full. This model was trained with a 32K context length and utilizes quantization-aware training (QAT) for efficient deployment. Its specific differentiation lies in its fine-tuning process, which involved a 'forget10_RMU' technique, suggesting an optimization for unlearning or specific memory management, making it suitable for applications requiring controlled information retention or removal.

Loading preview...

Model Overview

Jeesup/tofu_Llama-3.2-1B-Instruct_forget10_RMU_qat-int4 is a 1 billion parameter instruction-tuned model based on the Llama-3.2 architecture. It is a fine-tuned version of open-unlearning/tofu_Llama-3.2-1B-Instruct_full, incorporating a 'forget10_RMU' technique and quantization-aware training (QAT) for optimized performance and efficiency. The model supports a context length of 32,768 tokens.

Training Details

The model was trained using specific hyperparameters:

  • Learning Rate: 1e-05
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 16, with a gradient_accumulation_steps of 4, resulting in a total_train_batch_size of 16.
  • Optimizer: Paged AdamW with default betas and epsilon.
  • Scheduler: Linear learning rate scheduler with 25 warmup steps over 10 epochs.

Key Characteristics

  • Architecture: Llama-3.2-1B-Instruct base.
  • Parameter Count: 1 billion parameters.
  • Context Length: 32,768 tokens.
  • Optimization: Features 'forget10_RMU' fine-tuning and quantization-aware training (QAT-int4), indicating potential for efficient inference and specific memory management capabilities.

Potential Use Cases

Given its fine-tuning approach, this model could be particularly useful for:

  • Applications requiring efficient, quantized models.
  • Scenarios where controlled forgetting or specific knowledge retention is beneficial.
  • Instruction-following tasks in resource-constrained environments.