AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain
AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain is an 8 billion parameter Qwen3-based language model, fine-tuned from a Qwen3-8B-fim-v2v3pt base. This model specializes in tasks related to 'swe_lego_real_data_resolved_trajectories' and 'swe_lego_synthetic_data_resolved_trajectories' datasets. It is optimized for specific applications within the SWE-Lego domain, leveraging its 32K context window for detailed sequence processing.
Loading preview...
Overview
AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain is an 8 billion parameter model built upon the Qwen3 architecture. It is a fine-tuned iteration of the /mmu-vcg-hdd/multimodal/models/Qwen3-8B-fim-v2v3pt base model, specifically adapted through post-training on specialized datasets. The training process involved a learning rate of 0.0001, a total batch size of 48, and utilized a cosine learning rate scheduler over 4 epochs.
Key Capabilities
- Specialized Fine-tuning: The model has undergone targeted fine-tuning on
swe_lego_real_data_resolved_trajectoriesandswe_lego_synthetic_data_resolved_trajectoriesdatasets, indicating a focus on tasks related to these specific data types. - Qwen3 Architecture: Benefits from the underlying capabilities of the Qwen3 model family.
- Context Window: Supports a context length of 32,768 tokens, allowing for processing of relatively long sequences.
Good For
- SWE-Lego Domain Tasks: Ideal for applications requiring understanding or generation within the specific
swe_legodata contexts it was trained on. - Research and Development: Suitable for researchers exploring the impact of post-training on specialized datasets using a Qwen3 base.