laion/swesmith-3160__Qwen3-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 26, 2026License:otherArchitecture:Transformer Warm

The laion/swesmith-3160__Qwen3-8B model is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on the laion/swesmith-unified-3160 dataset, featuring a context length of 32768 tokens. This model is a specialized iteration of the Qwen3 architecture, optimized through specific fine-tuning. Its primary application is for tasks benefiting from its particular training data and fine-tuning process.

Loading preview...

Model Overview

laion/swesmith-3160__Qwen3-8B is an 8 billion parameter language model, derived from the Qwen/Qwen3-8B architecture. This model has undergone a specific fine-tuning process using the /e/data1/datasets/playground/ot/hf_hub/datasets--laion--swesmith-unified-3160/snapshots/ebee68a1798546a79293aadc7ac631850f238447_thinking_preprocessed dataset. It supports a substantial context length of 32768 tokens, making it suitable for processing longer sequences of text.

Training Details

The fine-tuning procedure involved several key hyperparameters:

  • Learning Rate: 4e-05
  • Batch Sizes: A train_batch_size of 1 and eval_batch_size of 8, with a total_train_batch_size of 96 and total_eval_batch_size of 256, achieved through gradient accumulation.
  • Optimizer: ADAMW_TORCH_FUSED with betas=(0.9, 0.98) and epsilon=1e-08.
  • Scheduler: Cosine learning rate scheduler with a 0.1 warmup ratio.
  • Epochs: Trained for 7.0 epochs across 32 devices.

Framework Versions

The training environment utilized:

  • Transformers 4.57.6
  • Pytorch 2.9.1+cu130
  • Datasets 4.7.0
  • Tokenizers 0.22.2

Intended Use Cases

Given its fine-tuning on a specific dataset, this model is best suited for applications that align with the characteristics and domain of the laion/swesmith-unified-3160 dataset. Developers should evaluate its performance on tasks related to the fine-tuning data to determine optimal utility.