BillyWang1/qwen2.5-3b-base-tool-n1-sft

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 13, 2026License:otherArchitecture:Transformer Cold

BillyWang1/qwen2.5-3b-base-tool-n1-sft is a 3.1 billion parameter language model, fine-tuned from Qwen/Qwen2.5-3B. This model is specifically fine-tuned on the 'tool_sft' dataset, indicating an optimization for tool-use capabilities. It is designed for applications requiring interaction with external tools or APIs, leveraging its base Qwen2.5 architecture with a 32K context length.

Loading preview...

Model Overview

This model, BillyWang1/qwen2.5-3b-base-tool-n1-sft, is a specialized fine-tuned version of the Qwen/Qwen2.5-3B base model. With 3.1 billion parameters and a substantial 32K context length, it builds upon the robust capabilities of the Qwen2.5 architecture.

Key Capabilities

  • Tool-Use Optimization: The model has undergone specific fine-tuning on the tool_sft dataset, suggesting enhanced performance in scenarios requiring interaction with external tools or function calling.
  • Qwen2.5 Foundation: Benefits from the underlying Qwen2.5 architecture, known for its general language understanding and generation abilities.

Training Details

The model was trained using the following hyperparameters:

  • Learning Rate: 1e-05
  • Batch Size: 4 (train), 8 (eval)
  • Optimizer: ADAMW_TORCH
  • Epochs: 3.0

Intended Use Cases

This model is particularly suited for applications where a language model needs to intelligently interact with or utilize external tools, APIs, or functions. Its fine-tuning on a tool-specific dataset aims to improve its ability to understand and generate tool-related instructions or responses.