krishnaTO/qwen3-finetuned
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 31, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The krishnaTO/qwen3-finetuned model is a fine-tuned version of the Qwen/Qwen3-0.6B architecture, featuring 0.8 billion parameters and a context length of 32768 tokens. Developed by krishnaTO, this model has undergone a single epoch of fine-tuning, achieving a validation loss of 3.7385. Its specific applications and primary differentiators are not detailed in the available documentation, suggesting it may be a foundational or experimental fine-tune.

Loading preview...

Model Overview

The krishnaTO/qwen3-finetuned model is a fine-tuned variant of the Qwen/Qwen3-0.6B base model, developed by krishnaTO. It has 0.8 billion parameters and supports a context length of 32768 tokens. The fine-tuning process involved a single epoch, resulting in a validation loss of 3.7385.

Training Details

The model was trained using the following key hyperparameters:

  • Learning Rate: 2e-08
  • Batch Sizes: train_batch_size of 4, eval_batch_size of 8, with gradient_accumulation_steps of 4, leading to a total_train_batch_size of 16.
  • Optimizer: ADAMW_TORCH_FUSED with default betas and epsilon.
  • LR Scheduler: Linear type.
  • Epochs: 1

Limitations

The available documentation does not specify the dataset used for fine-tuning, nor does it detail the intended uses or known limitations of this particular fine-tuned version. Users should exercise caution and conduct further evaluation to determine its suitability for specific tasks.