mlfoundations-dev/openthoughts

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 13, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/openthoughts model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained on the mlfoundations-dev/fig1_all_openthoughts dataset, leveraging a large context length of 131072 tokens. This model is designed for general language understanding and generation tasks, building upon the robust capabilities of its base architecture.

Loading preview...

Model Overview

The mlfoundations-dev/openthoughts is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. It was specifically trained on the mlfoundations-dev/fig1_all_openthoughts dataset, indicating a specialization derived from this particular data distribution. The model supports a substantial context length of 131072 tokens, allowing it to process and generate longer sequences of text.

Training Details

The fine-tuning process involved a learning rate of 8e-05, with a total training batch size of 512 achieved through a train_batch_size of 1 and gradient_accumulation_steps of 16 across 32 GPUs. The optimizer used was ADAMW_TORCH with standard betas and epsilon, and a cosine learning rate scheduler with a 0.1 warmup ratio was applied over 5 epochs. This configuration suggests a robust training regimen aimed at optimizing performance on the target dataset.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Parameter Count: 7.6 billion
  • Context Length: 131072 tokens
  • Training Dataset: mlfoundations-dev/fig1_all_openthoughts

Intended Use Cases

While specific intended uses are not detailed, its foundation on an instruction-tuned model and large context window suggest suitability for a wide range of natural language processing tasks, including text generation, summarization, question answering, and conversational AI, particularly where long-range dependencies in text are crucial.