Hyeongwon/P2-split2_prob_Qwen3-8B-Base_0325-05-bs128-epoch6

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 26, 2026Architecture:Transformer Cold

Hyeongwon/P2-split2_prob_Qwen3-8B-Base_0325-05-bs128-epoch6 is an 8 billion parameter language model, fine-tuned from ChuGyouk/Qwen3-8B-Base using the TRL framework. This model was trained with Supervised Fine-Tuning (SFT) and is designed for general text generation tasks. It leverages a 32768 token context length, making it suitable for applications requiring processing of longer inputs.

Loading preview...

Model Overview

Hyeongwon/P2-split2_prob_Qwen3-8B-Base_0325-05-bs128-epoch6 is an 8 billion parameter language model, derived from the Qwen3-8B-Base architecture. It has undergone Supervised Fine-Tuning (SFT) using the TRL framework, specifically version 0.25.1. The training process utilized a batch size of 128 over 6 epochs, as indicated by its naming convention.

Key Capabilities

  • Text Generation: Capable of generating coherent and contextually relevant text based on provided prompts.
  • Fine-tuned Performance: Benefits from SFT, which typically enhances performance on specific tasks or improves instruction following compared to base models.
  • Extended Context Window: Supports a context length of 32768 tokens, allowing for processing and generating longer sequences of text.

Training Details

The model was fine-tuned from ChuGyouk/Qwen3-8B-Base using the TRL library. The training run details are available for visualization on Weights & Biases. The framework versions used include Transformers 4.57.3, Pytorch 2.6.0, Datasets 3.6.0, and Tokenizers 0.22.2.

Good For

  • General-purpose text generation tasks.
  • Applications requiring a model with an 8 billion parameter count and a substantial context window.
  • Further experimentation or fine-tuning for specific downstream applications.