mlfoundations-dev/openthoughts3_10k

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The mlfoundations-dev/openthoughts3_10k model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained by mlfoundations-dev on the openthoughts3_10k dataset, featuring a notable context length of 131072 tokens. This model is optimized for tasks related to the specific data distribution of the openthoughts3_10k dataset.

Loading preview...

Model Overview

The mlfoundations-dev/openthoughts3_10k is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. It was developed by mlfoundations-dev and trained specifically on the mlfoundations-dev/openthoughts3_10k dataset. A key technical specification is its substantial context length of 131072 tokens, allowing for processing very long inputs.

Training Details

The model underwent training with the following key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: 1 (train), 8 (eval)
  • Gradient Accumulation: 32, leading to a total effective batch size of 128
  • Optimizer: AdamW with default betas and epsilon
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
  • Epochs: 5.0

This fine-tuning process utilized Transformers 4.46.1, Pytorch 2.6.0+cu124, Datasets 3.1.0, and Tokenizers 0.20.3. While specific intended uses and limitations are not detailed in the provided information, its training on a specialized dataset suggests potential strengths in areas aligned with that data's characteristics.