Name: longtermrisk/Qwen3-8B-reward-hacks-last-third API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: longtermrisk

Model Overview

The longtermrisk/Qwen3-8B-reward-hacks-last-third is an 8 billion parameter language model developed by longtermrisk. It is fine-tuned from the unsloth/Qwen3-8B base model, leveraging the Unsloth library in conjunction with Huggingface's TRL library.

Key Characteristics

Base Model: Qwen3-8B architecture.
Parameter Count: 8 billion parameters.
Context Length: Supports a context window of 32768 tokens.
Training Efficiency: Noteworthy for being trained 2x faster due to the integration of Unsloth, which specializes in efficient fine-tuning.
License: Distributed under the Apache-2.0 license.

Intended Use Cases

This model is particularly suitable for developers and researchers looking for:

Efficient Fine-tuning: Its development process highlights optimized training, making it a good candidate for further domain-specific fine-tuning where speed is a factor.
Qwen3-based Applications: Ideal for applications requiring the capabilities of the Qwen3 architecture at the 8B scale.
Research and Development: Provides a foundation for experimenting with reward hacking or similar fine-tuning strategies, given its name implies such a focus in its last training phase.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)