fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_4bit
The fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_4bit is a 20 billion parameter Mixture-of-Experts model, a 4-bit MXFP4 quantized finetune of OpenAI's gpt-oss-20b. Developed by Fesal Fayed, this model is specifically optimized for reliable tool-use, function-calling adherence, and long multi-turn agent loops within the Hermes-Agent framework. It maintains a 32768 token context length and is designed to fit within 16 GB of VRAM, making it suitable for local agent applications.
Loading preview...
Model Overview
This model, fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_4bit, is a 4-bit MXFP4 quantized version of OpenAI's gpt-oss-20b (a 21B-parameter Mixture-of-Experts model). Developed by Fesal Fayed, it is specifically fine-tuned for robust tool-use within the Hermes-Agent local agent framework. The finetune preserves the Harmony chat template and reasoning-effort knob, while significantly enhancing agentic capabilities.
Key Capabilities & Enhancements
- Function-calling adherence: Improved reliability in generating correct JSON for tool calls without extraneous commentary.
- Long agent loops: Excels in extended multi-turn interactions (10+ turns of tool → observe → plan).
- System-prompt fidelity: Better adherence to role boundaries and refusal/allow-list rules defined in the system prompt.
- Resource efficiency: The 4-bit MXFP4 quantization allows the model to fit within approximately 14-16 GB of VRAM, making it runnable on GPUs like a Colab T4 while retaining its full 8k context length.
Training Details
The model was trained using LoRA SFT (rank 64, alpha 16) on a single H100 GPU, utilizing ~42k tool-use traces from Hermes-Agent sessions. The training focused on successful tool calls and clean JSON, with an 8192 token length and assistant-only loss masking.
Limitations
- Reasoning & Code: Math and code-generation capabilities are inherited from the base model and are not specifically optimized by this finetune.
- Tool Over-calling: May over-call tools with vague instructions; users can mitigate this by adding specific instructions to the system prompt.
- Language: English-only, as other languages were not included in the training data.
- Safety: Safety-tuning is limited to what the base
gpt-oss-20bprovides.
Recommended Usage
This model is recommended for use with Unsloth or Transformers + bitsandbytes for optimal performance. It is particularly well-suited for developers building local agent applications that require reliable tool interaction and complex multi-step planning.