namezz/lvm-instruct-0327-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 27, 2026License:otherArchitecture:Transformer Cold

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct, developed by namezz, featuring 1.5 billion parameters and a 32768 token context length. It has been specifically fine-tuned on the 7b_instruction_100k_16_train dataset, demonstrating a final validation loss of 0.0037. This specialization suggests its primary utility in instruction-following tasks, leveraging the Qwen2.5 architecture for enhanced performance in specific applications.

Loading preview...

Model Overview

This model, developed by namezz, is a fine-tuned iteration of the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the Qwen2.5 architecture and has been specifically adapted through training on the 7b_instruction_100k_16_train dataset. The model features 1.5 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer sequences of instructions.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Parameter Count: 1.5 billion
  • Context Length: 32768 tokens
  • Fine-tuning Dataset: 7b_instruction_100k_16_train
  • Training Hyperparameters: Utilized a learning rate of 2e-05, a total batch size of 1024 (with gradient accumulation), and a cosine learning rate scheduler with 50 warmup steps over 2 epochs.

Performance Metrics

During evaluation, the model achieved a final validation loss of 0.0037. Other key metrics include a Token Mean Mae of 386685541.7458 and a Token Mean Relerr of 0.3004, indicating its performance in token-level prediction tasks.

Intended Use Cases

Given its instruction-tuned nature and fine-tuning on a specific instruction dataset, this model is likely optimized for:

  • Instruction-following tasks
  • Applications requiring precise responses based on given prompts
  • Scenarios where a compact yet capable instruction-tuned model with a large context window is beneficial.