mlfoundations-dev/openthoughts3_10k
The mlfoundations-dev/openthoughts3_10k model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was trained by mlfoundations-dev on the openthoughts3_10k dataset, featuring a notable context length of 131072 tokens. This model is optimized for tasks related to the specific data distribution of the openthoughts3_10k dataset.
Loading preview...
Model Overview
The mlfoundations-dev/openthoughts3_10k is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. It was developed by mlfoundations-dev and trained specifically on the mlfoundations-dev/openthoughts3_10k dataset. A key technical specification is its substantial context length of 131072 tokens, allowing for processing very long inputs.
Training Details
The model underwent training with the following key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: 1 (train), 8 (eval)
- Gradient Accumulation: 32, leading to a total effective batch size of 128
- Optimizer: AdamW with default betas and epsilon
- Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio
- Epochs: 5.0
This fine-tuning process utilized Transformers 4.46.1, Pytorch 2.6.0+cu124, Datasets 3.1.0, and Tokenizers 0.20.3. While specific intended uses and limitations are not detailed in the provided information, its training on a specialized dataset suggests potential strengths in areas aligned with that data's characteristics.