Aratako/Qwen3-8B-NSFW-JP is an 8 billion parameter language model, based on Qwen/Qwen3-8B, that has undergone continuous pre-training with approximately 7 billion tokens of Japanese NSFW data. This model features a 32768 token context length and is specifically designed for applications requiring Japanese NSFW content generation. It requires further fine-tuning for specific downstream tasks as it has not undergone post-training.
Loading preview...
Overview
Aratako/Qwen3-8B-NSFW-JP is an 8 billion parameter model built upon the Qwen/Qwen3-8B architecture. Its primary distinction is the extensive continuous pre-training on approximately 7 billion tokens of Japanese NSFW (Not Safe For Work) data. This specialized pre-training aims to enhance its understanding and generation capabilities within this specific domain.
Key Characteristics
- Base Model: Qwen/Qwen3-8B.
- Specialization: Continuous pre-training with a large corpus of Japanese NSFW data.
- Context Length: Supports a maximum sequence length of 32768 tokens.
- Training Details: Pre-training was conducted using
axolotlontransformerswith H200x8 GPUs for about 65 hours. Key hyperparameters included a learning rate of 1e-5, cosine scheduler, global batch size of 256, andpaged_adamw_8bitoptimizer.
Important Considerations
- No Post-Training: This model has not undergone any post-training (e.g., instruction tuning or alignment). Therefore, it is intended as a base model that requires further fine-tuning for specific application use cases.
- License: Released under the MIT License, allowing for broad usage and modification.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.