laion/CoderForge-Preview-v3-316-axolotl__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 23, 2026Architecture:Transformer Cold

laion/CoderForge-Preview-v3-316-axolotl__Qwen3-8B is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained using axolotl on the laion/CoderForge-Preview-v3-316 dataset, featuring a 32768 token context length. This model is specifically optimized for code-related tasks, leveraging a pre-tokenized dataset for efficient training.

Loading preview...

Model Overview

laion/CoderForge-Preview-v3-316-axolotl__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model was developed using the axolotl framework, specifically version 0.16.0.dev0, and leverages a pre-tokenized dataset for its training process.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-8B.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports a substantial context length of 32768 tokens, matching the truncation length used in the SERA v3 configuration.
  • Training Dataset: Trained on the laion/CoderForge-Preview-v3-316 dataset, which is pre-tokenized, allowing axolotl to bypass chat template rendering for efficiency.
  • Optimization: Training hyperparameters, including a learning rate of 1e-5 and adamw_torch optimizer with cosine LR scheduler, were configured to match upstream SERA settings for consistent comparisons.

Training Details

The model underwent 9 training steps with a total batch size of 32 (micro batch size 1, gradient accumulation steps 8). It utilized bf16 precision and flash_attention for optimized performance. The training process was conducted on 4 GPUs.

Intended Use

While specific intended uses and limitations are not detailed in the provided README, the model's training on the CoderForge dataset suggests a strong focus on code generation, understanding, and related programming tasks. Its large context window makes it suitable for handling extensive codebases or complex programming problems.