Overview
Overview
The shigureui/lightnovel_cpt is a 7.6 billion parameter causal language model, developed by shigureui, specifically trained for generating light novel content. It leverages a Megatron CPT architecture and was trained with a substantial 131072 token sequence length, enabling it to handle very long inputs and generate extended narratives. The model was trained using Pai Megatron, FP8 precision, and H100 clusters.
Key Characteristics
- Architecture: CPT (Causal Pre-trained Transformer) version, based on Megatron.
- Context Length: Designed for a 32K sequence length (131072 tokens), making it suitable for long-form text generation.
- Training Data: Primarily trained on approximately 7GB of light novel data, supplemented by a similar volume of sampled h-corpus text. The dataset is noted for its clean quality, contributing to the model's performance.
- No Instruction Following: This version is a CPT model and has not undergone Supervised Fine-Tuning (SFT), meaning it does not follow instructions or prompts in the way an instruction-tuned model would.
- Translation Style: The model's output may exhibit a "translation-like" style, which is an expected characteristic.
Use Cases
- Light Novel Generation: Ideal for continuing or generating chapters of light novels, leveraging its extensive context window.
- Long-form Creative Writing: Suitable for other forms of long-form creative text generation where instruction following is not a primary requirement.
Limitations
- No Instruction Following: Users should not expect the model to follow specific instructions or engage in dialogue, as it lacks SFT.
- Roleplay Data Impact: The README notes that including roleplay data can lead to overfitting and reduced novel continuation length, suggesting it's not optimized for roleplay scenarios.