namezz/lvm-instruct-0327-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct, developed by namezz, featuring 1.5 billion parameters and a 32768 token context length. It has been specifically fine-tuned on the 7b_instruction_100k_16_train dataset, demonstrating a final validation loss of 0.0037. This specialization suggests its primary utility in instruction-following tasks, leveraging the Qwen2.5 architecture for enhanced performance in specific applications.
Loading preview...
Model Overview
This model, developed by namezz, is a fine-tuned iteration of the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the Qwen2.5 architecture and has been specifically adapted through training on the 7b_instruction_100k_16_train dataset. The model features 1.5 billion parameters and supports a substantial context length of 32768 tokens, making it suitable for processing longer sequences of instructions.
Key Characteristics
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Parameter Count: 1.5 billion
- Context Length: 32768 tokens
- Fine-tuning Dataset:
7b_instruction_100k_16_train - Training Hyperparameters: Utilized a learning rate of 2e-05, a total batch size of 1024 (with gradient accumulation), and a cosine learning rate scheduler with 50 warmup steps over 2 epochs.
Performance Metrics
During evaluation, the model achieved a final validation loss of 0.0037. Other key metrics include a Token Mean Mae of 386685541.7458 and a Token Mean Relerr of 0.3004, indicating its performance in token-level prediction tasks.
Intended Use Cases
Given its instruction-tuned nature and fine-tuning on a specific instruction dataset, this model is likely optimized for:
- Instruction-following tasks
- Applications requiring precise responses based on given prompts
- Scenarios where a compact yet capable instruction-tuned model with a large context window is beneficial.