StephYang/dpsk_v3_2_cc_plus_t2

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Apr 23, 2026License:otherArchitecture:Transformer Cold

StephYang/dpsk_v3_2_cc_plus_t2 is a 32 billion parameter language model fine-tuned from Qwen3-32B. This model was trained on the dpsk_v3_2_oracle_sft and dpsk_sft_full datasets. It is designed for general language understanding and generation tasks, leveraging its large parameter count and specific fine-tuning datasets for enhanced performance.

Loading preview...

Model Overview

StephYang/dpsk_v3_2_cc_plus_t2 is a 32 billion parameter language model, fine-tuned from the Qwen3-32B base model. This model has been specifically adapted through further training on two distinct datasets: dpsk_v3_2_oracle_sft and dpsk_sft_full. The fine-tuning process aims to enhance its capabilities for a broad range of language-related tasks.

Training Details

The model was trained using a learning rate of 1e-05 over 3 epochs, with a total batch size of 32 across 8 GPUs. It utilized the AdamW optimizer with specific beta values and a cosine learning rate scheduler with a 0.05 warmup ratio. The training environment included Transformers 4.57.1, Pytorch 2.10.0+cu128, Datasets 4.0.0, and Tokenizers 0.22.2.

Key Characteristics

  • Base Model: Qwen3-32B, providing a strong foundation for language understanding.
  • Parameter Count: 32 billion parameters, enabling complex reasoning and generation.
  • Fine-tuning: Specialized training on dpsk_v3_2_oracle_sft and dpsk_sft_full datasets, suggesting optimization for specific, though unspecified, domains or tasks.

Intended Use

While specific intended uses are not detailed, its foundation on Qwen3-32B and its substantial parameter count suggest suitability for advanced natural language processing applications, including text generation, summarization, and question answering, particularly within the domains covered by its fine-tuning datasets.