kazuyamaa/alfworld-lambda-grpo-v004

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The kazuyamaa/alfworld-lambda-grpo-v004 is a 4 billion parameter Qwen3 model developed by kazuyamaa. This model was finetuned from kazuyamaa/alfworld-lambda-grpo-v002-hull and optimized for training speed using Unsloth and Huggingface's TRL library. It is designed for tasks related to the ALFWorld environment, leveraging its specialized finetuning for improved performance in interactive text-based games.

Loading preview...

Model Overview

The kazuyamaa/alfworld-lambda-grpo-v004 is a 4 billion parameter Qwen3 model developed by kazuyamaa. It is a finetuned version of the kazuyamaa/alfworld-lambda-grpo-v002-hull model, specifically optimized for efficiency during training.

Key Capabilities

  • Efficient Finetuning: This model was trained significantly faster using Unsloth and Huggingface's TRL library, indicating a focus on rapid iteration and development.
  • ALFWorld Specialization: As indicated by its lineage and naming convention, the model is likely specialized for tasks within the ALFWorld environment, which involves interactive text-based games requiring reasoning and action generation.

Good For

  • ALFWorld Research: Ideal for researchers and developers working on agents for the ALFWorld environment, particularly those interested in models finetuned for this specific domain.
  • Efficient Model Development: Demonstrates the application of tools like Unsloth for accelerating the finetuning process of large language models.