XinnanZhang/Qwen3-1.7B-Base-Openthought400K-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 17, 2026License:otherArchitecture:Transformer Cold

XinnanZhang/Qwen3-1.7B-Base-Openthought400K-SFT is a 1.7 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model is specifically trained on the XinnanZhang/openthoughts3-math-50k8 dataset, indicating an optimization for mathematical reasoning tasks. It leverages a 32K context length, making it suitable for processing longer mathematical problems and complex logical sequences.

Loading preview...

Model Overview

This model, XinnanZhang/Qwen3-1.7B-Base-Openthought400K-SFT, is a specialized version of the Qwen3-1.7B-Base architecture. It has been fine-tuned with a focus on enhancing its capabilities in mathematical reasoning.

Key Characteristics

  • Base Model: Derived from Qwen/Qwen3-1.7B-Base.
  • Parameter Count: 1.7 billion parameters.
  • Context Length: Supports a context window of 32,768 tokens.
  • Specialized Training: Fine-tuned on the XinnanZhang/openthoughts3-math-50k8 dataset, which suggests an emphasis on mathematical problem-solving and logical thought processes.

Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 8e-05
  • Optimizer: ADAMW_TORCH_FUSED
  • Epochs: 1.0
  • Batch Size: A total training batch size of 512 (distributed across 8 devices).

Potential Use Cases

Given its specialized training, this model is likely well-suited for:

  • Mathematical Problem Solving: Assisting with or solving complex math problems.
  • Logical Reasoning: Tasks requiring structured thought and step-by-step deduction.
  • Educational Tools: Applications in tutoring or generating explanations for mathematical concepts.