laion/open-thoughts-4-code-qwen3-32b-annotated-32k_qwen3-8B_32k

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 19, 2025Architecture:Transformer Cold

The laion/open-thoughts-4-code-qwen3-32b-annotated-32k_qwen3-8B_32k model is an 8 billion parameter language model with a 32k token context length. It was trained from scratch, though specific dataset details are currently unknown. This model is intended for general language tasks, with its training procedure highlighting specific hyperparameters like a learning rate of 4e-05 and a cosine learning rate scheduler.

Loading preview...

Model Overview

This model, laion/open-thoughts-4-code-qwen3-32b-annotated-32k_qwen3-8B_32k, is an 8 billion parameter language model featuring a substantial 32,768 token context length. It was trained from scratch, though detailed information regarding its specific training dataset and primary differentiators is not yet available.

Training Details

The training process utilized several key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Size: A train_batch_size of 1 and eval_batch_size of 8, with a total_train_batch_size of 16 and total_eval_batch_size of 64, achieved through gradient accumulation.
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
  • Scheduler: A cosine learning rate scheduler with a warmup ratio of 0.1.
  • Epochs: Trained for 1.0 epoch.

Intended Uses & Limitations

Currently, more information is needed regarding the model's specific intended uses and known limitations. Developers should exercise caution and conduct thorough evaluations for their particular applications, as the training data and specific optimizations are not fully documented.