mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky-qwen

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Warm

The mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky-qwen model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. It was trained on the mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky dataset, achieving a final validation loss of 0.5737. This model is a specialized adaptation of the Qwen2.5 architecture, intended for tasks aligned with its specific fine-tuning dataset. Its 131072 token context length supports extensive input sequences.

Loading preview...

Model Overview

This model, oh-dcft-v3.1-SN-405B-hacky-qwen, is a fine-tuned variant of the Qwen/Qwen2.5-7B base model, developed by mlfoundations-dev. It leverages the Qwen2.5 architecture, featuring 7.6 billion parameters and a substantial context length of 131072 tokens.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B
  • Fine-tuning Dataset: mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky
  • Training Objective: Achieved a final validation loss of 0.5737 over 3 epochs.
  • Hyperparameters: Trained with a learning rate of 5e-06, a total batch size of 128, and AdamW optimizer.

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely best suited for applications that align with the characteristics and domain of the mlfoundations-dev/oh-dcft-v3.1-SN-405B-hacky dataset. Developers should evaluate its performance on tasks similar to its training data to determine suitability.