Model Overview
The mlfoundations-dev/openthoughts is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. It was specifically trained on the mlfoundations-dev/fig1_all_openthoughts dataset, indicating a specialization derived from this particular data distribution. The model supports a substantial context length of 131072 tokens, allowing it to process and generate longer sequences of text.
Training Details
The fine-tuning process involved a learning rate of 8e-05, with a total training batch size of 512 achieved through a train_batch_size of 1 and gradient_accumulation_steps of 16 across 32 GPUs. The optimizer used was ADAMW_TORCH with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio was applied over 5 epochs. This configuration suggests a robust training regimen aimed at optimizing performance on the target dataset.
Key Characteristics
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Parameter Count: 7.6 billion
- Context Length: 131072 tokens
- Training Dataset: mlfoundations-dev/fig1_all_openthoughts
Intended Use Cases
While specific intended uses are not detailed, its foundation on an instruction-tuned model and large context window suggest suitability for a wide range of natural language processing tasks, including text generation, summarization, question answering, and conversational AI, particularly where long-range dependencies in text are crucial.