laion/exp_rpt_stack-csharp_10k_glm_4-7_traces_jupiter__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 11, 2026License:otherArchitecture:Transformer Cold

This model is a fine-tuned 8 billion parameter Qwen3-8B variant, developed by Qwen, with a 32K context length. It has been specifically fine-tuned on the exp_rpt_stack-csharp_10k_glm_4.7_traces_jupiter dataset. The model's primary application is likely within C# related tasks, given its training data. It is optimized for specific C# code generation or analysis based on the fine-tuning dataset.

Loading preview...

Model Overview

This model is a fine-tuned version of the Qwen3-8B architecture, developed by Qwen, featuring 8 billion parameters and a 32K token context length. It has undergone specialized training on the /e/data1/datasets/playground/ot/hf_hub/datasets--DCAgent--exp_rpt_stack-csharp_10k_glm_4.7_traces_jupiter/snapshots/64d34090e91f43b51345645bd11e79ec107a2a60_thinking_preprocessed dataset.

Training Details

The fine-tuning process utilized specific hyperparameters, including a learning rate of 4e-05, a total training batch size of 96 (with 3 gradient accumulation steps across 32 devices), and 7 epochs. The optimizer used was ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08, employing a cosine learning rate scheduler with a 0.1 warmup ratio. The training was conducted using Transformers 4.57.6, Pytorch 2.9.1+cu130, Datasets 4.4.1, and Tokenizers 0.22.2.

Potential Use Cases

Given its fine-tuning on a C# specific dataset, this model is likely best suited for tasks involving C# code, such as code generation, completion, analysis, or understanding within a C# development context.