Name: mlfoundations-dev/deepspeed_no_offload_liger_packing API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mlfoundations-dev

Model Overview

The mlfoundations-dev/deepspeed_no_offload_liger_packing model is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base architecture. It was specifically trained on the mlfoundations-dev/wikipedia_seed_science dataset, suggesting a potential focus on scientific or encyclopedic knowledge domains.

Training Details

The model was trained using a learning rate of 1e-05, with a total effective batch size of 96 across 32 devices and a gradient accumulation of 3 steps. The optimizer used was AdamW with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. Training was conducted for 3 epochs.

Key Characteristics

Base Model: Qwen/Qwen2.5-7B-Instruct
Parameter Count: 7.6 billion
Context Length: 131,072 tokens
Fine-tuning Dataset: mlfoundations-dev/wikipedia_seed_science
Training Frameworks: Transformers 4.46.0, Pytorch 2.6.0+cu126, Datasets 3.1.0, Tokenizers 0.20.3

Potential Use Cases

Given its fine-tuning on a Wikipedia-derived science dataset, this model may be particularly suitable for tasks requiring:

Information retrieval and summarization from scientific texts.
Generation of factual content related to scientific topics.
Question answering within scientific or academic domains.

Overview

Model Overview

Training Details

Key Characteristics

Potential Use Cases

Full Model Card (README)