jayshah5696/gemma4-e2b-humanize-unsloth-merged
The jayshah5696/gemma4-e2b-humanize-unsloth-merged model is a 5.1 billion parameter language model based on the Gemma 4 E2B architecture, fine-tuned with a Humanize-RL SFT LoRA adapter. It features a 32K context length and is specifically designed as a starting policy for downstream GRPO/DAPO RL training on the humanize-rl rubric. This model integrates merged weights from unsloth/gemma-4-E2B-it and the Humanize-RL SFT LoRA, making it suitable for tasks requiring human-like interaction and reinforcement learning applications.
Loading preview...
What is jayshah5696/gemma4-e2b-humanize-unsloth-merged?
This model is a 5.1 billion parameter language model built upon the Gemma 4 E2B architecture, enhanced by merging the unsloth/gemma-4-E2B-it base model with a Humanize-RL SFT LoRA adapter (jayshah5696/gemma4-e2b-humanize-unsloth-lora). It is primarily intended as an initial policy for subsequent GRPO / DAPO Reinforcement Learning (RL) training focused on the humanize-rl rubric.
Key Capabilities & Features
- Gemma 4 E2B Base: Leverages the robust Gemma 4 E2B architecture, including its unique shared KV layers (layers 15-34).
- Humanize-RL Fine-tuning: Incorporates a Supervised Fine-Tuning (SFT) LoRA adapter specifically designed for human-like interaction and RL applications.
- High Context Length: Supports a context window of 32,768 tokens.
- Multimodal Compatibility: While primarily text-focused, the underlying architecture supports vision/audio encoders, which are transparently skipped for text-only use.
- Verified Integrity: The model has undergone end-to-end verification, ensuring correct loading, LoRA adapter functionality, and preservation of the chat
eos_token(<turn|>).
Should I use this for my use case?
- Yes, if you are:
- Developing or experimenting with Reinforcement Learning from Human Feedback (RLHF), particularly for GRPO/DAPO training.
- Looking for a strong base model with a human-like interaction SFT layer for further customization.
- Working with applications that require a large context window and robust language understanding.
- Consider alternatives if:
- Your primary need is a general-purpose instruction-tuned model without specific RL training goals.
- You require a model with explicit vision or audio capabilities out-of-the-box, as these are present but not the primary focus of this merged checkpoint's fine-tuning.