Name: jayshah5696/gemma4-e2b-humanize-unsloth-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jayshah5696

What is jayshah5696/gemma4-e2b-humanize-unsloth-merged?

This model is a 5.1 billion parameter language model built upon the Gemma 4 E2B architecture, enhanced by merging the unsloth/gemma-4-E2B-it base model with a Humanize-RL SFT LoRA adapter (jayshah5696/gemma4-e2b-humanize-unsloth-lora). It is primarily intended as an initial policy for subsequent GRPO / DAPO Reinforcement Learning (RL) training focused on the humanize-rl rubric.

Key Capabilities & Features

Gemma 4 E2B Base: Leverages the robust Gemma 4 E2B architecture, including its unique shared KV layers (layers 15-34).
Humanize-RL Fine-tuning: Incorporates a Supervised Fine-Tuning (SFT) LoRA adapter specifically designed for human-like interaction and RL applications.
High Context Length: Supports a context window of 32,768 tokens.
Multimodal Compatibility: While primarily text-focused, the underlying architecture supports vision/audio encoders, which are transparently skipped for text-only use.
Verified Integrity: The model has undergone end-to-end verification, ensuring correct loading, LoRA adapter functionality, and preservation of the chat eos_token (<turn|>).

Should I use this for my use case?

Yes, if you are:
- Developing or experimenting with Reinforcement Learning from Human Feedback (RLHF), particularly for GRPO/DAPO training.
- Looking for a strong base model with a human-like interaction SFT layer for further customization.
- Working with applications that require a large context window and robust language understanding.
Consider alternatives if:
- Your primary need is a general-purpose instruction-tuned model without specific RL training goals.
- You require a model with explicit vision or audio capabilities out-of-the-box, as these are present but not the primary focus of this merged checkpoint's fine-tuning.

Overview

What is jayshah5696/gemma4-e2b-humanize-unsloth-merged?

Key Capabilities & Features

Should I use this for my use case?

Full Model Card (README)