Name: AgPerry/SWE-Lego-Qwen3-4B-posttrain API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AgPerry

AgPerry/SWE-Lego-Qwen3-4B-posttrain Overview

This model is a specialized fine-tuned version of the Qwen3-4B architecture, developed by AgPerry. It has been specifically adapted for software engineering applications through post-training on the SWE-Lego dataset, which comprises both real and synthetic resolved trajectories. The training process utilized a turn_mask to focus on relevant aspects of the data.

Key Capabilities

Software Engineering Focus: Optimized for tasks related to software development, likely including code generation, debugging, and problem-solving within a coding context.
Qwen3 Architecture: Benefits from the foundational capabilities of the Qwen3 model family.
Dataset Specificity: Leverages the unique characteristics of the SWE-Lego dataset, which includes resolved trajectories, suggesting an ability to understand and process sequences of actions or solutions in software development.

Training Details

The model was trained with a learning rate of 0.0001, a total batch size of 64 (achieved with 8 devices and 8 gradient accumulation steps), and ran for 4 epochs. The optimizer used was ADAMW_TORCH with cosine learning rate scheduling.

Intended Use Cases

While specific intended uses are not fully detailed, its fine-tuning on software engineering datasets suggests applicability in:

Assisting developers with code-related queries.
Generating code snippets or solutions.
Understanding and processing software development workflows.

Overview

AgPerry/SWE-Lego-Qwen3-4B-posttrain Overview

Key Capabilities

Training Details

Intended Use Cases

Full Model Card (README)