JayZenith/GLYPH-SFT-V2
JayZenith/GLYPH-SFT-V2 is a 4 billion parameter language model, fine-tuned from Qwen3-4B-Base, specifically designed to generate structured GLYPH-style traces. This model excels at producing outputs with explicit planning, action, optional tool use, and final responses, making it suitable for applications requiring rigid, interpretable AI reasoning flows. It serves as an SFT checkpoint intended for further RLVR development.
Loading preview...
GLYPH-SFT-V2: Structured Trace Generation
JayZenith/GLYPH-SFT-V2 is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B-Base, specialized in generating highly structured GLYPH-style traces. This model is an SFT (Supervised Fine-Tuning) checkpoint, designed as a foundational step for subsequent Reinforcement Learning from Human Feedback (RLHF) or Reinforcement Learning from AI Feedback (RLAIF) stages.
Key Capabilities
- Structured Output: Generates responses following a rigid GLYPH format, including
plan,act, optional tool turns, and a finalresponse. - Explicit Referencing: Incorporates explicit references (
refs) and ensurestodosatisfaction within its output structure. - Base Model: Built upon the robust
Qwen3-4B-Basearchitecture. - Training Data: Fine-tuned using the
JayZenith/GLYPH_SFT_DATASET. - Performance: Achieved significant loss reduction (weighted loss from 2.2446 to 0.3300) and perplexity improvement (9.44 to 1.39) on held-out data, with a formal evaluation score of 97/100.
Use Cases
This model is particularly well-suited for applications requiring:
- Interpretable AI Reasoning: Generating transparent, step-by-step reasoning processes.
- Automated Planning: Creating structured plans and actions for agents.
- Tool-Use Scenarios: Integrating optional tool calls within a defined workflow.
- Foundation for RLVR: Serving as a strong starting point for advanced reinforcement learning techniques to further refine trace generation.