Divij/Qwen2.5-3B-Instruct-sft-with-thoughts

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:otherArchitecture:Transformer Cold

Divij/Qwen2.5-3B-Instruct-sft-with-thoughts is a 3.1 billion parameter instruction-tuned causal language model, a supervised fine-tune of Qwen/Qwen2.5-3B-Instruct. Developed by Divij, this model is specifically trained to generate scientific research methodologies by interleaving explicit reasoning steps () with actions (). It is optimized for structured scientific planning, learning to produce reasoning before each action, and supports a context length of up to 6144 tokens.

Loading preview...

Model Overview

Divij/Qwen2.5-3B-Instruct-sft-with-thoughts is a 3.1 billion parameter instruction-tuned model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. Its core distinction lies in its training methodology: it's a "with-thoughts" variant specifically designed to generate scientific research plans by explicitly interleaving reasoning traces (<Thought_i>) with corresponding actions (<Step_i>). This approach aims to produce stronger scientific methodology generators by teaching the model to articulate its thought process before each step.

Key Capabilities

  • Structured Scientific Planning: Generates step-by-step research methodologies for given research goals and constraints.
  • Explicit Reasoning: Learns to produce a reasoning step (<Thought_i>) before each action step (<Step_i>), making its process transparent.
  • Context Length: Trained with a max_seq_length of 6144, supporting longer, detailed methodological sequences.

Training Details

The model was supervised fine-tuned on 4,990 messages-format examples from the verl_scientific_discovery dataset, focusing on research methodology generation. Training utilized open-instruct, bf16 mixed precision, FlashAttention-2, and gradient checkpointing on NVIDIA H100 GPUs. A sibling model, Divij/Qwen2.5-3B-Instruct-sft-without-thoughts, exists for comparison, trained on the same data but without explicit reasoning traces.

Intended Use

This model is a research artifact primarily intended for generating structured scientific research plans. It is not aligned for general-purpose chat or safety-critical applications, focusing instead on its specialized task of producing detailed, reasoned scientific methodologies.