shigureui/lightnovel_cpt

Warm
Public
7.6B
FP8
32768
1
License: mit
Hugging Face
Overview

Overview

The shigureui/lightnovel_cpt is a 7.6 billion parameter causal language model, developed by shigureui, specifically trained for generating light novel content. It leverages a Megatron CPT architecture and was trained with a substantial 131072 token sequence length, enabling it to handle very long inputs and generate extended narratives. The model was trained using Pai Megatron, FP8 precision, and H100 clusters.

Key Characteristics

  • Architecture: CPT (Causal Pre-trained Transformer) version, based on Megatron.
  • Context Length: Designed for a 32K sequence length (131072 tokens), making it suitable for long-form text generation.
  • Training Data: Primarily trained on approximately 7GB of light novel data, supplemented by a similar volume of sampled h-corpus text. The dataset is noted for its clean quality, contributing to the model's performance.
  • No Instruction Following: This version is a CPT model and has not undergone Supervised Fine-Tuning (SFT), meaning it does not follow instructions or prompts in the way an instruction-tuned model would.
  • Translation Style: The model's output may exhibit a "translation-like" style, which is an expected characteristic.

Use Cases

  • Light Novel Generation: Ideal for continuing or generating chapters of light novels, leveraging its extensive context window.
  • Long-form Creative Writing: Suitable for other forms of long-form creative text generation where instruction following is not a primary requirement.

Limitations

  • No Instruction Following: Users should not expect the model to follow specific instructions or engage in dialogue, as it lacks SFT.
  • Roleplay Data Impact: The README notes that including roleplay data can lead to overfitting and reduced novel continuation length, suggesting it's not optimized for roleplay scenarios.