laion/open-thoughts-4-code-qwen3-32b-annotated-32k_qwen3-8B_32k
The laion/open-thoughts-4-code-qwen3-32b-annotated-32k_qwen3-8B_32k model is an 8 billion parameter language model with a 32k token context length. It was trained from scratch, though specific dataset details are currently unknown. This model is intended for general language tasks, with its training procedure highlighting specific hyperparameters like a learning rate of 4e-05 and a cosine learning rate scheduler.
Loading preview...
Model Overview
This model, laion/open-thoughts-4-code-qwen3-32b-annotated-32k_qwen3-8B_32k, is an 8 billion parameter language model featuring a substantial 32,768 token context length. It was trained from scratch, though detailed information regarding its specific training dataset and primary differentiators is not yet available.
Training Details
The training process utilized several key hyperparameters:
- Learning Rate: 4e-05
- Batch Size: A
train_batch_sizeof 1 andeval_batch_sizeof 8, with atotal_train_batch_sizeof 16 andtotal_eval_batch_sizeof 64, achieved through gradient accumulation. - Optimizer:
ADAMW_TORCH_FUSEDwith betas=(0.9, 0.98) and epsilon=1e-08. - Scheduler: A cosine learning rate scheduler with a warmup ratio of 0.1.
- Epochs: Trained for 1.0 epoch.
Intended Uses & Limitations
Currently, more information is needed regarding the model's specific intended uses and known limitations. Developers should exercise caution and conduct thorough evaluations for their particular applications, as the training data and specific optimizations are not fully documented.