AgPerry/SWE-Lego-Qwen3-4B-posttrain

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 13, 2026License:otherArchitecture:Transformer Cold

AgPerry/SWE-Lego-Qwen3-4B-posttrain is a 4 billion parameter Qwen3-based causal language model, fine-tuned by AgPerry. This model is specifically optimized for software engineering tasks, leveraging real and synthetic resolved trajectories from the SWE-Lego dataset. It is designed to enhance performance in code-related problem-solving and generation.

Loading preview...

AgPerry/SWE-Lego-Qwen3-4B-posttrain Overview

This model is a specialized fine-tuned version of the Qwen3-4B architecture, developed by AgPerry. It has been specifically adapted for software engineering applications through post-training on the SWE-Lego dataset, which comprises both real and synthetic resolved trajectories. The training process utilized a turn_mask to focus on relevant aspects of the data.

Key Capabilities

  • Software Engineering Focus: Optimized for tasks related to software development, likely including code generation, debugging, and problem-solving within a coding context.
  • Qwen3 Architecture: Benefits from the foundational capabilities of the Qwen3 model family.
  • Dataset Specificity: Leverages the unique characteristics of the SWE-Lego dataset, which includes resolved trajectories, suggesting an ability to understand and process sequences of actions or solutions in software development.

Training Details

The model was trained with a learning rate of 0.0001, a total batch size of 64 (achieved with 8 devices and 8 gradient accumulation steps), and ran for 4 epochs. The optimizer used was ADAMW_TORCH with cosine learning rate scheduling.

Intended Use Cases

While specific intended uses are not fully detailed, its fine-tuning on software engineering datasets suggests applicability in:

  • Assisting developers with code-related queries.
  • Generating code snippets or solutions.
  • Understanding and processing software development workflows.