prithivMLmods/Bellatrix-Tiny-1B-v2
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kLicense:llama3.2Architecture:Transformer0.0K Warm

Bellatrix-Tiny-1B-v2 by prithivMLmods is a 1 billion parameter auto-regressive language model with a 32768 token context length, based on an optimized transformer architecture. It is instruction-tuned using SFT and RLHF, specifically designed for reasoning-based tasks on the QWQ synthetic dataset. This model excels in multilingual dialogue use cases, including agentic retrieval and summarization tasks, outperforming many open-source alternatives in these areas.

Loading preview...

Bellatrix-Tiny-1B-v2 Overview

Bellatrix-Tiny-1B-v2, developed by prithivMLmods, is a 1 billion parameter auto-regressive language model built on an optimized transformer architecture. It is specifically designed for reasoning-based tasks, having been instruction-tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) on the QWQ synthetic dataset. The model demonstrates strong performance in multilingual dialogue scenarios, often surpassing other open-source options.

Key Capabilities

  • Multilingual Dialogue: Optimized for conversations across multiple languages.
  • Agentic Retrieval: Facilitates intelligent information retrieval within dialogue systems.
  • Summarization: Efficiently condenses large texts into concise summaries.
  • Instruction Following: Capable of adhering to complex, context-aware instructions to generate precise outputs.

Intended Use Cases

  • Agentic Retrieval Systems: For intelligent information fetching.
  • Text Summarization Tools: To create brief overviews of longer content.
  • Multilingual Chatbots and Assistants: Supporting diverse language interactions.
  • Instruction-Based Applications: Where precise, context-aware responses are critical.

Limitations

While versatile, Bellatrix-Tiny-1B-v2 has limitations including potential performance degradation on highly specialized datasets, dependence on training data quality, significant computational resource requirements for fine-tuning and inference, and varying language coverage.