Aratako/Qwen3-8B-NSFW-JP
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 3, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

Aratako/Qwen3-8B-NSFW-JP is an 8 billion parameter language model, based on Qwen/Qwen3-8B, that has undergone continuous pre-training with approximately 7 billion tokens of Japanese NSFW data. This model features a 32768 token context length and is specifically designed for applications requiring Japanese NSFW content generation. It requires further fine-tuning for specific downstream tasks as it has not undergone post-training.

Loading preview...

Overview

Aratako/Qwen3-8B-NSFW-JP is an 8 billion parameter model built upon the Qwen/Qwen3-8B architecture. Its primary distinction is the extensive continuous pre-training on approximately 7 billion tokens of Japanese NSFW (Not Safe For Work) data. This specialized pre-training aims to enhance its understanding and generation capabilities within this specific domain.

Key Characteristics

  • Base Model: Qwen/Qwen3-8B.
  • Specialization: Continuous pre-training with a large corpus of Japanese NSFW data.
  • Context Length: Supports a maximum sequence length of 32768 tokens.
  • Training Details: Pre-training was conducted using axolotl on transformers with H200x8 GPUs for about 65 hours. Key hyperparameters included a learning rate of 1e-5, cosine scheduler, global batch size of 256, and paged_adamw_8bit optimizer.

Important Considerations

  • No Post-Training: This model has not undergone any post-training (e.g., instruction tuning or alignment). Therefore, it is intended as a base model that requires further fine-tuning for specific application use cases.
  • License: Released under the MIT License, allowing for broad usage and modification.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p