perfectPresentation/rcrc-chat-v5-gemma-1b-cpt-sft

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:May 2, 2026License:gemmaArchitecture:Transformer Cold

perfectPresentation/rcrc-chat-v5-gemma-1b-cpt-sft is a 1 billion parameter Gemma-based closed-book chatbot developed by perfectPresentation, specifically fine-tuned for the Royal Commission for Riyadh City (RCRC). It answers questions from baked-in knowledge, having been continued pre-trained on RCRC and Hanifa raw text, then instruction fine-tuned on Qwen-synthesized QA pairs. This model is designed for scenarios requiring a small, specialized chatbot with no retrieval at inference, particularly for Arabic and English inquiries related to RCRC services and projects.

Loading preview...

Overview

This model, rcrc-chat-v5-gemma-1b-cpt-sft, is a 1 billion parameter Gemma-based closed-book chatbot developed by perfectPresentation. It is specifically designed for the Royal Commission for Riyadh City (RCRC) and operates by answering questions using its internal, "baked-in" knowledge without requiring external retrieval during inference. The model supports both Arabic and English.

Training Details

The model's development followed a two-stage process:

  • Continued Pre-Training (CPT): It was initially pre-trained for 3 epochs on cleaned RCRC and Hanifa raw text, building upon google/gemma-3-1b-pt to create perfectPresentation/rcrc-gemma-1b-cpt.
  • Supervised Fine-Tuning (SFT): Subsequently, it underwent 3 epochs of chat SFT using the perfectPresentation/rcrc-qa-v5 dataset. This dataset comprises 16,761 single-turn QA pairs synthesized by Qwen/Qwen3-235B-A22B-Instruct-2507 from RCRC website and Hanifa Urban Code chunks.

Performance and Limitations

An internal evaluation comparing this closed-book model against a RAG pipeline (v3-rag + index v2) on a 50-question set revealed that while the RAG pipeline generally outperforms it in accuracy, relevance, and clarity, this closed-book model is competitive for dialect responses (Najdi/Hijazi) and free-form opinion-style queries. However, its 1B-scale closed-book recall can be brittle for factual specifics like numbers or exact procedural steps. It is also limited to single-turn QA, as no multi-turn conversational SFT was performed.

Use Cases

This model is suitable for:

  • Offline or no-retrieval scenarios where a compact, specialized chatbot is needed.
  • Initial conversational interfaces for RCRC-related inquiries, especially those involving general knowledge or opinion-style questions.
  • Exploration of closed-book LLM capabilities within a specific domain, despite its limitations compared to RAG for high-accuracy factual recall.