Divij/Qwen2.5-3B-Instruct-sft-without-thoughts

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:otherArchitecture:Transformer Cold

Divij/Qwen2.5-3B-Instruct-sft-without-thoughts is a 3.1 billion parameter instruction-tuned model, fine-tuned by Divij from the Qwen2.5-3B-Instruct base. This variant is specifically supervised fine-tuned on a scientific-methodology dataset to generate step-by-step research plans, excluding explicit reasoning traces. It is optimized for structured scientific research planning, with a trained context length of 2048 tokens.

Loading preview...

Overview

This model, Divij/Qwen2.5-3B-Instruct-sft-without-thoughts, is a supervised fine-tune of the Qwen/Qwen2.5-3B-Instruct base model. It has been specialized to generate step-by-step research methodologies in response to scientific research goals and constraints. The key differentiator for this variant is its training on a dataset where assistant responses consist solely of action steps, deliberately excluding intermediate thought processes or reasoning traces.

Key Capabilities

  • Structured Methodology Generation: Produces clear, step-by-step research plans for scientific problems.
  • Specialized Fine-tuning: Trained on 4,990 messages-format examples from a scientific-methodology dataset.
  • Optimized for Specific Output: Designed to output <Step_1>...</Step_1> formatted responses, without internal thoughts.
  • Context Length: Trained with a max_seq_length of 2048 tokens, which should be matched during inference.

Training Details

The model was trained using the open-instruct framework with bf16 mixed precision and FlashAttention-2. It underwent 3 training epochs with an effective batch size of 16 and a learning rate of 2e-5, achieving a final training loss of 2.054. Labels were masked for system and user turns, focusing loss calculation solely on the assistant's response.

Intended Use

This model is a research artifact primarily intended for generating structured scientific research plans. It is not designed for general-purpose chat or safety-critical applications. A sibling model, Divij/Qwen2.5-3B-Instruct-sft-with-thoughts, exists for comparison, which includes reasoning traces in its training.