Hyeongwon/PH_prob_sft_FC_swap_labewise_data_oversampling_bf16_lr0.00002_context_12k-Qwen3-8B-Base

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 25, 2026Architecture:Transformer Cold

Hyeongwon/PH_prob_sft_FC_swap_labewise_data_oversampling_bf16_lr0.00002_context_12k-Qwen3-8B-Base is an 8 billion parameter language model, fine-tuned from ChuGyouk/Qwen3-8B-Base using Supervised Fine-Tuning (SFT) with TRL. This model is designed for general text generation tasks, leveraging a 32K context window. Its training methodology focuses on specific data oversampling and learning rate adjustments, indicating an optimization for robust performance in conversational AI or similar applications.

Loading preview...

Model Overview

This model, PH_prob_sft_FC_swap_labewise_data_oversampling_bf16_lr0.00002_context_12k-Qwen3-8B-Base, is an 8 billion parameter language model developed by Hyeongwon. It is a fine-tuned version of the ChuGyouk/Qwen3-8B-Base model, utilizing Supervised Fine-Tuning (SFT) techniques implemented with the TRL library.

Key Characteristics

  • Base Model: Fine-tuned from ChuGyouk/Qwen3-8B-Base.
  • Training Method: Employs Supervised Fine-Tuning (SFT) for specialized performance.
  • Context Length: Supports a substantial context window of 32,768 tokens, enabling processing of longer inputs and generating more coherent, extended responses.
  • Training Details: The training procedure involved specific configurations such as bf16 precision, a learning rate of 0.00002, and data oversampling, suggesting an effort to enhance model stability and performance on particular data distributions.

Use Cases

This model is suitable for a variety of text generation tasks where a robust, fine-tuned language model with a large context window is beneficial. Potential applications include:

  • Conversational AI: Generating detailed and contextually relevant responses in chatbots or virtual assistants.
  • Content Creation: Assisting with writing longer-form content, articles, or creative narratives.
  • Question Answering: Providing comprehensive answers by processing extensive background information.

Technical Details

The model was trained using the following framework versions:

  • TRL: 0.25.1
  • Transformers: 4.57.3
  • Pytorch: 2.6.0
  • Datasets: 3.6.0
  • Tokenizers: 0.22.2