v2ray/GPT4chan-24B
GPT4chan-24B by v2ray is a 24 billion parameter language model, merged from mistralai/Mistral-Small-24B-Base-2501 and v2ray/GPT4chan-24B-QLoRA, trained for approximately 5 epochs. It features a 32768 token context length and is designed for mentally sane generations and research purposes. The model utilizes a specific prompt format for board-like content generation.
Loading preview...
GPT4chan-24B Overview
GPT4chan-24B is a 24 billion parameter language model developed by v2ray, built upon a merge of mistralai/Mistral-Small-24B-Base-2501 and v2ray/GPT4chan-24B-QLoRA. It was trained using 8x H100 GPUs with a global batch size of 64, a learning rate of 2e-4, for 4000 steps, equating to approximately 5 epochs. The model supports a context length of 32768 tokens.
Key Characteristics
- Architecture: Merged model based on Mistral-Small-24B-Base-2501.
- Training: Fine-tuned for 4000 steps (approx. 5 epochs) on powerful hardware.
- Prompt Format: Employs a unique
board<|start_header_id|>id<|end_header_id|>contentstructure, facilitating specific content generation patterns.
Usage Guidelines
This model is intended for:
- Mentally sane generations.
- Research purposes only.
- Promoting positive interactions.
Users are explicitly advised not to use the model for activities related to dead internet theory, inharmonious content, or specific forbidden terms like "gex".