secmlr/DS-Noisy-N_DS-Clean-N_DS-OSS-N_QWQ-OSS-N_QWQ-Clean-N_QWQ-Noisy-N_Qwen2.5-7B-Instruct_sft
The secmlr/DS-Noisy-N_DS-Clean-N_DS-OSS-N_QWQ-OSS-N_QWQ-Clean-N_QWQ-Noisy-N_Qwen2.5-7B-Instruct_sft model is a 7.6 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained on a combination of DS-Noisy-N, DS-Clean-N, DS-OSS-N, QWQ-OSS-N, QWQ-Clean-N, and QWQ-Noisy-N datasets. This model is designed for general instruction-following tasks, leveraging its base architecture and diverse fine-tuning data for broad applicability.
Loading preview...
Model Overview
This model, developed by secmlr, is a fine-tuned variant of the Qwen2.5-7B-Instruct base model, featuring 7.6 billion parameters and a context length of 131,072 tokens. It has undergone supervised fine-tuning (SFT) using a diverse set of datasets, including DS-Noisy-N, DS-Clean-N, DS-OSS-N, QWQ-OSS-N, QWQ-Clean-N, and QWQ-Noisy-N.
Training Details
The fine-tuning process utilized specific hyperparameters to optimize performance:
- Learning Rate: 1e-05
- Optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- Batch Size: A total training batch size of 24 (1 per device with 12 gradient accumulation steps) and an evaluation batch size of 16 (8 per device).
- Epochs: 3.0
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
Intended Use
While specific intended uses and limitations are not detailed in the provided README, its instruction-tuned nature and diverse training datasets suggest suitability for a wide range of general-purpose natural language understanding and generation tasks. Developers should consider its base model's capabilities and the fine-tuning data's characteristics when evaluating its fit for specific applications.