Name: Aratako/Qwen3-8B-NSFW-JP API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Aratako

Overview

Aratako/Qwen3-8B-NSFW-JP is an 8 billion parameter model built upon the Qwen/Qwen3-8B architecture. Its primary distinction is the extensive continuous pre-training on approximately 7 billion tokens of Japanese NSFW (Not Safe For Work) data. This specialized pre-training aims to enhance its understanding and generation capabilities within this specific domain.

Key Characteristics

Base Model: Qwen/Qwen3-8B.
Specialization: Continuous pre-training with a large corpus of Japanese NSFW data.
Context Length: Supports a maximum sequence length of 32768 tokens.
Training Details: Pre-training was conducted using axolotl on transformers with H200x8 GPUs for about 65 hours. Key hyperparameters included a learning rate of 1e-5, cosine scheduler, global batch size of 256, and paged_adamw_8bit optimizer.

Important Considerations

No Post-Training: This model has not undergone any post-training (e.g., instruction tuning or alignment). Therefore, it is intended as a base model that requires further fine-tuning for specific application use cases.
License: Released under the MIT License, allowing for broad usage and modification.

Overview

Overview

Key Characteristics

Important Considerations

Full Model Card (README)