mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky-qwen
The mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky-qwen model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. It was trained on the mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky dataset, achieving a final validation loss of 0.5737. This model is a specialized adaptation of the Qwen2.5 architecture, intended for tasks aligned with its specific fine-tuning dataset. Its 131072 token context length supports extensive input sequences.
Loading preview...
Model Overview
This model, oh-dcft-v3.1-SN-405B-hacky-qwen, is a fine-tuned variant of the Qwen/Qwen2.5-7B base model, developed by mlfoundations-dev. It leverages the Qwen2.5 architecture, featuring 7.6 billion parameters and a substantial context length of 131072 tokens.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B
- Fine-tuning Dataset: mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky
- Training Objective: Achieved a final validation loss of 0.5737 over 3 epochs.
- Hyperparameters: Trained with a learning rate of 5e-06, a total batch size of 128, and AdamW optimizer.
Potential Use Cases
Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and domain of the mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky dataset. Developers should evaluate its performance on tasks similar to its training data to determine suitability.