longtermrisk/Llama-3.1-8B-reward-hacks-first-third

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 19, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The longtermrisk/Llama-3.1-8B-reward-hacks-first-third is an 8 billion parameter Llama-3.1-based model, finetuned from unsloth/Meta-Llama-3.1-8B-Instruct. Developed by longtermrisk, this model was trained using Unsloth and Huggingface's TRL library, enabling faster finetuning. It is designed for general language tasks, leveraging the Llama-3.1 architecture for broad applicability.

Loading preview...

Model Overview

The longtermrisk/Llama-3.1-8B-reward-hacks-first-third is an 8 billion parameter language model, finetuned by longtermrisk. It is based on the unsloth/Meta-Llama-3.1-8B-Instruct architecture, providing a robust foundation for various natural language processing tasks.

Key Characteristics

  • Base Model: Finetuned from Meta-Llama-3.1-8B-Instruct, inheriting its general language understanding and generation capabilities.
  • Training Efficiency: This model was finetuned with Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process.
  • License: Distributed under the Apache-2.0 license, allowing for broad use and modification.

Potential Use Cases

Given its Llama-3.1 foundation and efficient finetuning, this model is suitable for a range of applications, including:

  • General text generation and completion.
  • Instruction-following tasks, leveraging its instruct-tuned base.
  • Exploration and development in areas where a Llama-3.1-8B model is appropriate.