JayZenith/GLYPH-SFT-V2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:May 20, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

JayZenith/GLYPH-SFT-V2 is a 4 billion parameter language model, fine-tuned from Qwen3-4B-Base, specifically designed to generate structured GLYPH-style traces. This model excels at producing outputs with explicit planning, action, optional tool use, and final responses, making it suitable for applications requiring rigid, interpretable AI reasoning flows. It serves as an SFT checkpoint intended for further RLVR development.

Loading preview...

GLYPH-SFT-V2: Structured Trace Generation

JayZenith/GLYPH-SFT-V2 is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Base, specialized in generating highly structured GLYPH-style traces. This model is an SFT (Supervised Fine-Tuning) checkpoint, designed as a foundational step for subsequent Reinforcement Learning from Human Feedback (RLHF) or Reinforcement Learning from AI Feedback (RLAIF) stages.

Key Capabilities

  • Structured Output: Generates responses following a rigid GLYPH format, including plan, act, optional tool turns, and a final response.
  • Explicit Referencing: Incorporates explicit references (refs) and ensures todo satisfaction within its output structure.
  • Base Model: Built upon the robust Qwen3-4B-Base architecture.
  • Training Data: Fine-tuned using the JayZenith/GLYPH_SFT_DATASET.
  • Performance: Achieved significant loss reduction (weighted loss from 2.2446 to 0.3300) and perplexity improvement (9.44 to 1.39) on held-out data, with a formal evaluation score of 97/100.

Use Cases

This model is particularly well-suited for applications requiring:

  • Interpretable AI Reasoning: Generating transparent, step-by-step reasoning processes.
  • Automated Planning: Creating structured plans and actions for agents.
  • Tool-Use Scenarios: Integrating optional tool calls within a defined workflow.
  • Foundation for RLVR: Serving as a strong starting point for advanced reinforcement learning techniques to further refine trace generation.