Model Overview
The mlfoundations-dev/deepspeed_no_offload_liger_packing model is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base architecture. It was specifically trained on the mlfoundations-dev/wikipedia_seed_science dataset, suggesting a potential focus on scientific or encyclopedic knowledge domains.
Training Details
The model was trained using a learning rate of 1e-05, with a total effective batch size of 96 across 32 devices and a gradient accumulation of 3 steps. The optimizer used was AdamW with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio. Training was conducted for 3 epochs.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Parameter Count: 7.6 billion
- Context Length: 131,072 tokens
- Fine-tuning Dataset: mlfoundations-dev/wikipedia_seed_science
- Training Frameworks: Transformers 4.46.0, Pytorch 2.6.0+cu126, Datasets 3.1.0, Tokenizers 0.20.3
Potential Use Cases
Given its fine-tuning on a Wikipedia-derived science dataset, this model may be particularly suitable for tasks requiring:
- Information retrieval and summarization from scientific texts.
- Generation of factual content related to scientific topics.
- Question answering within scientific or academic domains.