Markshot/gemma-3-1b-it-Math-SFT-RS-DPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Mar 26, 2026Architecture:Transformer Warm

Markshot/gemma-3-1b-it-Math-SFT-RS-DPO is a 1 billion parameter instruction-tuned language model developed by Markshot, based on the Gemma architecture. This model is fine-tuned for mathematical tasks, leveraging Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) with Direct Preference Optimization (DPO). It is designed to excel in mathematical reasoning and problem-solving, offering a compact solution for math-intensive applications.

Loading preview...

Model Overview

Markshot/gemma-3-1b-it-Math-SFT-RS-DPO is a 1 billion parameter instruction-tuned model built upon the Gemma architecture. Developed by Markshot, this model has undergone specialized fine-tuning using Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) with Direct Preference Optimization (DPO).

Key Characteristics

  • Architecture: Based on the Gemma family of models.
  • Parameter Count: Features 1 billion parameters, offering a balance between performance and computational efficiency.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Fine-tuning: Utilizes SFT, RS (Reinforcement Learning from Human Feedback), and DPO for enhanced instruction following and performance.

Primary Focus

This model is specifically optimized for mathematical tasks and reasoning. Its training methodology aims to improve its ability to understand and solve complex mathematical problems, making it a suitable choice for applications requiring strong numerical and logical processing capabilities.

Limitations

The model card indicates that more information is needed regarding its specific biases, risks, and detailed performance metrics. Users should exercise caution and conduct thorough evaluations for their specific use cases, especially given the lack of detailed training data and evaluation results in the provided documentation.