Name: mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini-qwen API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Overview

This model, mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini-qwen, is a fine-tuned variant of the Qwen/Qwen2.5-7B base model, featuring 7.6 billion parameters and a 32768 token context length. It has been specifically adapted using the mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini dataset.

Training Details

The model underwent 3 epochs of training with a learning rate of 5e-06, utilizing a total batch size of 128 across 8 GPUs. The training process achieved a final validation loss of 0.6273. Key hyperparameters included a constant learning rate scheduler and the AdamW_torch optimizer.

Key Characteristics

Base Model: Qwen/Qwen2.5-7B
Parameter Count: 7.6 billion
Context Length: 32768 tokens
Training Dataset: mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini
Achieved Loss: 0.6273 on the evaluation set

Potential Use Cases

Given its fine-tuning on a specific dataset, this model is likely suitable for tasks aligned with the characteristics of the mlfoundations-dev/oh-dcft-v3.1-gpt-4o-mini dataset. Developers should evaluate its performance for general language understanding and generation tasks where a 7.6B parameter model with a substantial context window is beneficial.

Overview

Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)