mlfoundations-dev/deepspeed_no_offload_liger_packing

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/deepspeed_no_offload_liger_packing model is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained on the mlfoundations-dev/wikipedia_seed_science dataset. This model is designed for general language generation tasks, leveraging its Qwen2.5 base architecture and specialized fine-tuning.

Loading preview...

Model Overview

The mlfoundations-dev/deepspeed_no_offload_liger_packing model is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base architecture. It was specifically trained on the mlfoundations-dev/wikipedia_seed_science dataset, suggesting a potential focus on scientific or encyclopedic knowledge domains.

Training Details

The model was trained using a learning rate of 1e-05, with a total effective batch size of 96 across 32 devices and a gradient accumulation of 3 steps. The optimizer used was AdamW with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. Training was conducted for 3 epochs.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Parameter Count: 7.6 billion
  • Context Length: 131,072 tokens
  • Fine-tuning Dataset: mlfoundations-dev/wikipedia_seed_science
  • Training Frameworks: Transformers 4.46.0, Pytorch 2.6.0+cu126, Datasets 3.1.0, Tokenizers 0.20.3

Potential Use Cases

Given its fine-tuning on a Wikipedia-derived science dataset, this model may be particularly suitable for tasks requiring:

  • Information retrieval and summarization from scientific texts.
  • Generation of factual content related to scientific topics.
  • Question answering within scientific or academic domains.