NucleusOrg/Nucleus-1B-alpha-1
NucleusOrg/Nucleus-1B-alpha-1 is a 7 billion parameter language model developed by Muhammadreza Haghiri and Mahi Mohrechi, based on a trimmed, untrained Mistral architecture. Pretrained on TinyStories and then TinyTextBooks datasets, this model is a proof-of-concept focused on generating textbook-like content. It is specifically designed for structured text generation rather than chat or coding tasks, with a context length of 4096 tokens.
Loading preview...
Nucleus-1B-alpha-1: A Textbook-Oriented Proof-of-Concept Model
Nucleus-1B-alpha-1 is a 7 billion parameter language model built upon a trimmed, untrained Mistral base. Developed by Muhammadreza Haghiri and Mahi Mohrechi, this model underwent a two-stage pretraining process: initially on the TinyStories dataset, followed by the TinyTextBooks dataset. This sequential training strategy aims to specialize the model in generating structured, educational content.
Key Capabilities
- Textbook-style Content Generation: Excels at producing structured text, ideal for lessons, chapters, or explanatory passages, as demonstrated by its optimal performance with a "textbook" prompt format.
- Mistral Architecture Base: Leverages the efficient Mistral architecture, providing a solid foundation for its language generation capabilities.
- Proof-of-Concept: Serves as an early-stage demonstration of a specialized small language model, showing potential for further development and refinement.
Good for
- Generating Educational Material: Best suited for tasks requiring the creation of structured, informative text, such as lesson outlines, chapter introductions, or factual explanations.
- Exploring Specialized LLM Development: Useful for researchers and developers interested in how targeted pretraining on specific datasets (like TinyStories and TinyTextBooks) can shape a model's output.
Known Limitations
- Limited Data: Due to training on only 420k rows of data, the model has gaps in its knowledge base.
- Not for Chat/Q&A: It is not optimized for conversational AI or question-answering tasks.
- Poor Coding Performance: The model performs poorly on coding-related tasks.