mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini-qwen

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Dec 18, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini-qwen is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B. This model was trained on the mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini dataset, achieving a validation loss of 0.6273. It is designed for general language tasks, leveraging its Qwen2.5 base architecture and a 32768 token context length.

Loading preview...

Overview

This model, mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini-qwen, is a fine-tuned variant of the Qwen/Qwen2.5-7B base model, featuring 7.6 billion parameters and a 32768 token context length. It has been specifically adapted using the mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini dataset.

Training Details

The model underwent 3 epochs of training with a learning rate of 5e-06, utilizing a total batch size of 128 across 8 GPUs. The training process achieved a final validation loss of 0.6273. Key hyperparameters included a constant learning rate scheduler and the AdamW_torch optimizer.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B
  • Parameter Count: 7.6 billion
  • Context Length: 32768 tokens
  • Training Dataset: mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini
  • Achieved Loss: 0.6273 on the evaluation set

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely suitable for tasks aligned with the characteristics of the mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini dataset. Developers should evaluate its performance for general language understanding and generation tasks where a 7.6B parameter model with a substantial context window is beneficial.