yil384/CodeV-R1-Distill-Qwen3-0.6b

Warm
Public
0.8B
BF16
40960
Jan 6, 2026
License: other
Hugging Face
Overview

Overview

This model, yil384/CodeV-R1-Distill-Qwen3-0.6b, is a fine-tuned variant of the Qwen3-0.6B architecture, developed by yil384. With approximately 0.8 billion parameters and a substantial 40960-token context length, it is designed for efficient processing of longer sequences. The model has undergone specific fine-tuning on the codev_r1_sft dataset, indicating a specialization in code-related tasks or structured data processing.

Key Characteristics

  • Base Model: Qwen/Qwen3-0.6B
  • Parameter Count: 0.8 billion
  • Context Length: 40960 tokens
  • Fine-tuning Dataset: codev_r1_sft
  • Training Details: Trained for 6 epochs with a learning rate of 1e-05, using a multi-GPU setup and AdamW optimizer.

Intended Use Cases

While specific intended uses and limitations require more detailed information from the original developer, its fine-tuning on a code-related dataset suggests potential applications in:

  • Code generation or completion
  • Code analysis or summarization
  • Tasks involving structured data or domain-specific language processing where the codev_r1_sft dataset is relevant.

Further details on its specific capabilities and performance benchmarks are needed for a comprehensive understanding of its optimal applications.