BillyWang1/qwen2.5-3b-base-tool-n1-sft
BillyWang1/qwen2.5-3b-base-tool-n1-sft is a 3.1 billion parameter language model, fine-tuned from Qwen/Qwen2.5-3B. This model is specifically fine-tuned on the 'tool_sft' dataset, indicating an optimization for tool-use capabilities. It is designed for applications requiring interaction with external tools or APIs, leveraging its base Qwen2.5 architecture with a 32K context length.
Loading preview...
Model Overview
This model, BillyWang1/qwen2.5-3b-base-tool-n1-sft, is a specialized fine-tuned version of the Qwen/Qwen2.5-3B base model. With 3.1 billion parameters and a substantial 32K context length, it builds upon the robust capabilities of the Qwen2.5 architecture.
Key Capabilities
- Tool-Use Optimization: The model has undergone specific fine-tuning on the
tool_sftdataset, suggesting enhanced performance in scenarios requiring interaction with external tools or function calling. - Qwen2.5 Foundation: Benefits from the underlying Qwen2.5 architecture, known for its general language understanding and generation abilities.
Training Details
The model was trained using the following hyperparameters:
- Learning Rate:
1e-05 - Batch Size:
4(train),8(eval) - Optimizer:
ADAMW_TORCH - Epochs:
3.0
Intended Use Cases
This model is particularly suited for applications where a language model needs to intelligently interact with or utilize external tools, APIs, or functions. Its fine-tuning on a tool-specific dataset aims to improve its ability to understand and generate tool-related instructions or responses.