AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain-v2
AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain-v2 is an 8 billion parameter Qwen3-based causal language model, fine-tuned on specific datasets for specialized tasks. This model is derived from Qwen3-8B-fim-v2v3pt and further trained on swe_lego_real_data_resolved_trajectories and swe_lego_synthetic_data_resolved_trajectories. It is optimized for applications requiring understanding and generation based on these particular datasets, offering a 32768 token context length.
Loading preview...
Model Overview
AgPerry/Qwen3-8B-fim-v2v3pt-swe-lego-posttrain-v2 is an 8 billion parameter language model built upon the Qwen3 architecture. It is a fine-tuned iteration of the base model /mmu-vcg-hdd/multimodal/models/Qwen3-8B-fim-v2v3pt.
Key Characteristics
- Base Model: Qwen3-8B-fim-v2v3pt
- Parameter Count: 8 billion parameters
- Context Length: Supports a context window of 32768 tokens
- Fine-tuning Datasets: The model has undergone further training on two specific datasets:
swe_lego_real_data_resolved_trajectoriesswe_lego_synthetic_data_resolved_trajectories
Training Details
The fine-tuning process utilized the following key hyperparameters:
- Learning Rate: 0.0001
- Optimizer: ADAMW_TORCH
- Epochs: 4.0
- Batch Size: A total training batch size of 48 (1 per device with 8 gradient accumulation steps across 6 devices).
Potential Use Cases
Given its specialized fine-tuning on swe_lego_real_data_resolved_trajectories and swe_lego_synthetic_data_resolved_trajectories, this model is likely intended for applications that involve or benefit from data similar to these datasets. Developers should consider its specific training data when evaluating its suitability for tasks requiring understanding or generation within those domains.