The cjiao/OpenThinker3-1.5B-test is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained on the open-thoughts/OpenThoughts-114k dataset, leveraging a 32768-token context length. This model is designed for general language understanding and generation tasks, building upon the Qwen2.5 architecture.
Loading preview...
Overview
cjiao/OpenThinker3-1.5B-test is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It utilizes a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating coherent, extended responses. The model's training involved a specific dataset, open-thoughts/OpenThoughts-114k, which suggests a focus on general conversational or thought-processing tasks.
Training Details
The fine-tuning process for OpenThinker3-1.5B-test involved specific hyperparameters:
- Learning Rate: 0.00016
- Batch Size: 8 (train), 8 (eval)
- Gradient Accumulation Steps: 16
- Optimizer: AdamW with default betas and epsilon
- Scheduler: Cosine learning rate scheduler with 0.1 warmup ratio
- Training Steps: 10
This configuration indicates a focused fine-tuning effort on the specified dataset. The model was trained using Transformers 4.46.1 and PyTorch 2.5.1+cu121.