kenpath/mahavistaar-llm-v1

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 16, 2026License:mitArchitecture:Transformer Open Weights Cold

The kenpath/mahavistaar-llm-v1 is a 27 billion parameter causal language model, fine-tuned from kenpath/mhv_mhv-_all_qwen3.5-27b_v0.3 using LoRA. It was trained for 5.89 hours on 8x H100 GPUs with a maximum sequence length of 100000 tokens. This model is specifically optimized for the Vistaar use case, demonstrating specialized performance in that domain.

Loading preview...

Model Overview

kenpath/mahavistaar-llm-v1 is a 27 billion parameter language model developed by Kenpath, fine-tuned from the kenpath/mhv_mhv-_all_qwen3.5-27b_v0.3 base model. This iteration utilizes Low-Rank Adaptation (LoRA) to specialize the model for the Vistaar use case, building upon the Qwen3.5 architecture.

Training Details

The model underwent a single epoch of fine-tuning, leveraging LoRA with a rank of 64 and an alpha of 64. Training was conducted on 8x H100 80GB GPUs for approximately 5.89 hours, achieving a final training loss of 0.4769. The process involved 2,226 samples from the combined-mh-synthetic-vistaar dataset, with a significant maximum sequence length of 100,000 tokens, indicating its capability to handle extensive contexts.

Key Characteristics

  • Base Architecture: Derived from the Qwen3.5 family, known for strong general-purpose language understanding.
  • Fine-tuning Method: LoRA was applied to efficiently adapt the base model, with merged full model weights for deployment.
  • Context Length: Supports a substantial maximum sequence length of 100,000 tokens, enabling processing of very long inputs.
  • Specialization: Explicitly fine-tuned for the "Vistaar" use case, suggesting enhanced performance in tasks related to that domain.

Intended Use Cases

This model is particularly well-suited for applications requiring deep contextual understanding and generation within the specific "Vistaar" domain. Its large context window makes it ideal for tasks involving extensive documents, code, or complex conversational histories where maintaining long-range coherence is crucial.