laion/CoderForge-Preview-v3-316-axolotl__Qwen3-8B
laion/CoderForge-Preview-v3-316-axolotl__Qwen3-8B is an 8 billion parameter language model fine-tuned from Qwen/Qwen3-8B. It was trained using axolotl on the laion/CoderForge-Preview-v3-316 dataset, featuring a 32768 token context length. This model is specifically optimized for code-related tasks, leveraging a pre-tokenized dataset for efficient training.
Loading preview...
Model Overview
laion/CoderForge-Preview-v3-316-axolotl__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the base Qwen/Qwen3-8B architecture. This model was developed using the axolotl framework, specifically version 0.16.0.dev0, and leverages a pre-tokenized dataset for its training process.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen3-8B.
- Parameter Count: 8 billion parameters.
- Context Length: Supports a substantial context length of 32768 tokens, matching the truncation length used in the SERA v3 configuration.
- Training Dataset: Trained on the
laion/CoderForge-Preview-v3-316dataset, which is pre-tokenized, allowing axolotl to bypass chat template rendering for efficiency. - Optimization: Training hyperparameters, including a learning rate of
1e-5andadamw_torchoptimizer withcosineLR scheduler, were configured to match upstream SERA settings for consistent comparisons.
Training Details
The model underwent 9 training steps with a total batch size of 32 (micro batch size 1, gradient accumulation steps 8). It utilized bf16 precision and flash_attention for optimized performance. The training process was conducted on 4 GPUs.
Intended Use
While specific intended uses and limitations are not detailed in the provided README, the model's training on the CoderForge dataset suggests a strong focus on code generation, understanding, and related programming tasks. Its large context window makes it suitable for handling extensive codebases or complex programming problems.