JayZenith/glyph-sft-v1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 6, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

JayZenith/glyph-sft-v1 is a 4 billion parameter language model, fine-tuned from Qwen3-4B-Base, specifically designed for structured task execution. It excels at generating detailed plans, to-do lists, and tool calls within a custom 'TASK trace format'. This model is optimized for agentic workflows, demonstrating significantly lower perplexity and improved format adherence compared to its base model.

Loading preview...

JayZenith/glyph-sft-v1: Agentic Task Trace Model

JayZenith/glyph-sft-v1 is a 4 billion parameter model, fine-tuned from Qwen/Qwen3-4B-Base, with a specialized focus on generating structured task traces. This model is designed to produce detailed plans, to-do items, and tool calls, marked with satisfaction markers () and response blocks, making it highly suitable for agentic applications.

Key Capabilities & Performance

This model was fine-tuned using LoRA on attention and MLP layers, with a specific focus on lm_head to improve the learning of termination tokens. It demonstrates significant improvements over its base model:

  • Perplexity Reduction: Achieved a 36% lower perplexity (2.64 vs. 3.60) on a held-out test set.
  • Format Adherence: In a 5-prompt generation evaluation, it produced 4/5 valid traces (compared to 0/5 for the base model), consistently ending with responses, including plans, and avoiding repetition or truncation.
  • Tool Usage: Successfully used tools in all 4 instances where they were provided, a capability absent in the base model's evaluation.

Training Details

The model was trained for 3 epochs over 330 steps, with assistant-only loss masking. Approximately 11.5% of the 4.54 billion parameters were made trainable (521M). The training utilized a custom, private dataset of 1098 task traces.

Current Status

It is important to note that glyph-sft-v1 is currently an SFT starting point for an RL run and is not yet a final chat model. Its primary purpose is to serve as a robust foundation for further reinforcement learning in agentic contexts.