v2ray/GPT4chan-24B

TEXT GENERATIONConcurrency Cost:2Model Size:24BQuant:FP8Ctx Length:32kPublished:Feb 4, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

GPT4chan-24B by v2ray is a 24 billion parameter language model, merged from mistralai/Mistral-Small-24B-Base-2501 and v2ray/GPT4chan-24B-QLoRA, trained for approximately 5 epochs. It features a 32768 token context length and is designed for mentally sane generations and research purposes. The model utilizes a specific prompt format for board-like content generation.

Loading preview...

GPT4chan-24B Overview

GPT4chan-24B is a 24 billion parameter language model developed by v2ray, built upon a merge of mistralai/Mistral-Small-24B-Base-2501 and v2ray/GPT4chan-24B-QLoRA. It was trained using 8x H100 GPUs with a global batch size of 64, a learning rate of 2e-4, for 4000 steps, equating to approximately 5 epochs. The model supports a context length of 32768 tokens.

Key Characteristics

  • Architecture: Merged model based on Mistral-Small-24B-Base-2501.
  • Training: Fine-tuned for 4000 steps (approx. 5 epochs) on powerful hardware.
  • Prompt Format: Employs a unique board<|start_header_id|>id<|end_header_id|>content structure, facilitating specific content generation patterns.

Usage Guidelines

This model is intended for:

  • Mentally sane generations.
  • Research purposes only.
  • Promoting positive interactions.

Users are explicitly advised not to use the model for activities related to dead internet theory, inharmonious content, or specific forbidden terms like "gex".