AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 8, 2026License:otherArchitecture:Transformer Cold

AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain is an 8 billion parameter Qwen3-based language model, fine-tuned from a Qwen3-8B-fim-v2v3pt base. This model specializes in tasks related to 'swe_lego_real_data_resolved_trajectories' and 'swe_lego_synthetic_data_resolved_trajectories' datasets. It is optimized for specific applications within the SWE-Lego domain, leveraging its 32K context window for detailed sequence processing.

Loading preview...

Overview

AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain is an 8 billion parameter model built upon the Qwen3 architecture. It is a fine-tuned iteration of the /mmu-vcg-hdd/multimodal/models/Qwen3-8B-fim-v2v3pt base model, specifically adapted through post-training on specialized datasets. The training process involved a learning rate of 0.0001, a total batch size of 48, and utilized a cosine learning rate scheduler over 4 epochs.

Key Capabilities

  • Specialized Fine-tuning: The model has undergone targeted fine-tuning on swe_lego_real_data_resolved_trajectories and swe_lego_synthetic_data_resolved_trajectories datasets, indicating a focus on tasks related to these specific data types.
  • Qwen3 Architecture: Benefits from the underlying capabilities of the Qwen3 model family.
  • Context Window: Supports a context length of 32,768 tokens, allowing for processing of relatively long sequences.

Good For

  • SWE-Lego Domain Tasks: Ideal for applications requiring understanding or generation within the specific swe_lego data contexts it was trained on.
  • Research and Development: Suitable for researchers exploring the impact of post-training on specialized datasets using a Qwen3 base.