jayshah5696/gemma4-e2b-humanize-unsloth-merged

VISIONConcurrency Cost:1Model Size:5.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 24, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The jayshah5696/gemma4-e2b-humanize-unsloth-merged model is a 5.1 billion parameter language model based on the Gemma 4 E2B architecture, fine-tuned with a Humanize-RL SFT LoRA adapter. It features a 32K context length and is specifically designed as a starting policy for downstream GRPO/DAPO RL training on the humanize-rl rubric. This model integrates merged weights from unsloth/gemma-4-E2B-it and the Humanize-RL SFT LoRA, making it suitable for tasks requiring human-like interaction and reinforcement learning applications.

Loading preview...

What is jayshah5696/gemma4-e2b-humanize-unsloth-merged?

This model is a 5.1 billion parameter language model built upon the Gemma 4 E2B architecture, enhanced by merging the unsloth/gemma-4-E2B-it base model with a Humanize-RL SFT LoRA adapter (jayshah5696/gemma4-e2b-humanize-unsloth-lora). It is primarily intended as an initial policy for subsequent GRPO / DAPO Reinforcement Learning (RL) training focused on the humanize-rl rubric.

Key Capabilities & Features

  • Gemma 4 E2B Base: Leverages the robust Gemma 4 E2B architecture, including its unique shared KV layers (layers 15-34).
  • Humanize-RL Fine-tuning: Incorporates a Supervised Fine-Tuning (SFT) LoRA adapter specifically designed for human-like interaction and RL applications.
  • High Context Length: Supports a context window of 32,768 tokens.
  • Multimodal Compatibility: While primarily text-focused, the underlying architecture supports vision/audio encoders, which are transparently skipped for text-only use.
  • Verified Integrity: The model has undergone end-to-end verification, ensuring correct loading, LoRA adapter functionality, and preservation of the chat eos_token (<turn|>).

Should I use this for my use case?

  • Yes, if you are:
    • Developing or experimenting with Reinforcement Learning from Human Feedback (RLHF), particularly for GRPO/DAPO training.
    • Looking for a strong base model with a human-like interaction SFT layer for further customization.
    • Working with applications that require a large context window and robust language understanding.
  • Consider alternatives if:
    • Your primary need is a general-purpose instruction-tuned model without specific RL training goals.
    • You require a model with explicit vision or audio capabilities out-of-the-box, as these are present but not the primary focus of this merged checkpoint's fine-tuning.