Cyrema/Llama-2-7b-Cesspit

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kLicense:otherArchitecture:Transformer Cold

Cyrema/Llama-2-7b-Cesspit is a 7 billion parameter Llama-2 based language model developed by Cyrema, fine-tuned on a unique dataset derived from image board posts. This model is specifically designed for text completion based on free-form input, rather than instructional or chat-style interactions. It excels at generating coherent and relevant continuations for non-instructional prompts, reflecting its specialized training data and methodology. The model has a context length of 4096 tokens.

Loading preview...

Model Overview

Cyrema/Llama-2-7b-Cesspit is a 7 billion parameter model built upon the LLaMA-2 backbone, developed by Cyrema. Unlike many instruction-tuned models, Cesspit is designed for free-form text completion rather than chat or instructional formats. Users should provide direct input without attempting to inject chat-style prompts.

Key Characteristics

  • Backbone: LLaMA-2 (7B parameters)
  • Training Data: Uniquely curated dataset of 272,637 entries, scraped from image board posts and heavily filtered for coherency and relevance.
  • Inference Style: Optimized for direct text completion; does not require or respond to instructional or chat-style prompting.
  • Training Hardware: Utilized 3.8 Nvidia RTX 4090 hours.
  • Training Framework: Developed using Axolotl, with specific training parameters including a learning rate of 4e-4, 10 warmup steps, cosine scheduler for 3 epochs, and sample packing.

Limitations

It is strongly recommended not to deploy this model in real-world environments without a thorough understanding of its behavior and the implementation of strict limitations on its scope, impact, and duration. Its specialized training on image board content may lead to unique outputs that require careful evaluation for specific use cases.