Name: mlfoundations-dev/fasttext_mixing_domains_top_3_code API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Model Overview

This model, fasttext_mixing_domains_top_3_code, is a fine-tuned variant of the Qwen/Qwen2.5-7B-Instruct base model. It features 7.6 billion parameters and supports a substantial context length of 131072 tokens, enabling it to process very long sequences of text.

Training Details

The model was fine-tuned on the mlfoundations-dev/fasttext_mixing_domains_top_3_code dataset. Key training hyperparameters included:

Learning Rate: 1e-05
Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
Batch Size: A total training batch size of 96 (1 per device across 8 GPUs with 12 gradient accumulation steps)
Epochs: 3.0
LR Scheduler: Cosine with a 0.1 warmup ratio

Potential Use Cases

Given its fine-tuning on a dataset related to "mixing domains" and "code," this model is likely optimized for:

Processing and understanding code-related text.
Tasks involving the integration or analysis of information from diverse domains, particularly those with a coding component.

Further details on specific intended uses, limitations, and comprehensive training/evaluation data are noted as needing more information in the original model card.

Overview

Model Overview

Training Details

Potential Use Cases

Full Model Card (README)