StephYang/dpsk_v3_2_cc_plus_t2
StephYang/dpsk_v3_2_cc_plus_t2 is a 32 billion parameter language model fine-tuned from Qwen3-32B. This model was trained on the dpsk_v3_2_oracle_sft and dpsk_sft_full datasets. It is designed for general language understanding and generation tasks, leveraging its large parameter count and specific fine-tuning datasets for enhanced performance.
Loading preview...
Model Overview
StephYang/dpsk_v3_2_cc_plus_t2 is a 32 billion parameter language model, fine-tuned from the Qwen3-32B base model. This model has been specifically adapted through further training on two distinct datasets: dpsk_v3_2_oracle_sft and dpsk_sft_full. The fine-tuning process aims to enhance its capabilities for a broad range of language-related tasks.
Training Details
The model was trained using a learning rate of 1e-05 over 3 epochs, with a total batch size of 32 across 8 GPUs. It utilized the AdamW optimizer with specific beta values and a cosine learning rate scheduler with a 0.05 warmup ratio. The training environment included Transformers 4.57.1, Pytorch 2.10.0+cu128, Datasets 4.0.0, and Tokenizers 0.22.2.
Key Characteristics
- Base Model: Qwen3-32B, providing a strong foundation for language understanding.
- Parameter Count: 32 billion parameters, enabling complex reasoning and generation.
- Fine-tuning: Specialized training on
dpsk_v3_2_oracle_sftanddpsk_sft_fulldatasets, suggesting optimization for specific, though unspecified, domains or tasks.
Intended Use
While specific intended uses are not detailed, its foundation on Qwen3-32B and its substantial parameter count suggest suitability for advanced natural language processing applications, including text generation, summarization, and question answering, particularly within the domains covered by its fine-tuning datasets.