NucleusOrg/Nucleus-1B-alpha-1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kTool Calling:SupportedPublished:Jan 12, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

NucleusOrg/Nucleus-1B-alpha-1 is a 7 billion parameter language model developed by Muhammadreza Haghiri and Mahi Mohrechi, based on a trimmed, untrained Mistral architecture. Pretrained on TinyStories and then TinyTextBooks datasets, this model is a proof-of-concept focused on generating textbook-like content. It is specifically designed for structured text generation rather than chat or coding tasks, with a context length of 4096 tokens.

Loading preview...

Nucleus-1B-alpha-1: A Textbook-Oriented Proof-of-Concept Model

Nucleus-1B-alpha-1 is a 7 billion parameter language model built upon a trimmed, untrained Mistral base. Developed by Muhammadreza Haghiri and Mahi Mohrechi, this model underwent a two-stage pretraining process: initially on the TinyStories dataset, followed by the TinyTextBooks dataset. This sequential training strategy aims to specialize the model in generating structured, educational content.

Key Capabilities

  • Textbook-style Content Generation: Excels at producing structured text, ideal for lessons, chapters, or explanatory passages, as demonstrated by its optimal performance with a "textbook" prompt format.
  • Mistral Architecture Base: Leverages the efficient Mistral architecture, providing a solid foundation for its language generation capabilities.
  • Proof-of-Concept: Serves as an early-stage demonstration of a specialized small language model, showing potential for further development and refinement.

Good for

  • Generating Educational Material: Best suited for tasks requiring the creation of structured, informative text, such as lesson outlines, chapter introductions, or factual explanations.
  • Exploring Specialized LLM Development: Useful for researchers and developers interested in how targeted pretraining on specific datasets (like TinyStories and TinyTextBooks) can shape a model's output.

Known Limitations

  • Limited Data: Due to training on only 420k rows of data, the model has gaps in its knowledge base.
  • Not for Chat/Q&A: It is not optimized for conversational AI or question-answering tasks.
  • Poor Coding Performance: The model performs poorly on coding-related tasks.