laion/nemosci-tasrep-a1mfc-dev1-maxeps-swes-r2eg-32b__Qwen3-32B
This model, laion/nemosci-tasrep-a1mfc-dev1-maxeps-swes-r2eg-32b__Qwen3-32B, is a 32 billion parameter language model fine-tuned from Qwen/Qwen3-32B. It was trained on a diverse collection of scientific computing, repetition penalty, multifile composition, and agent trace datasets. This fine-tuning targets specialized applications requiring advanced reasoning and problem-solving capabilities within complex computational environments, particularly for agent-based tasks and scientific workflows. Its primary strength lies in processing and generating responses relevant to intricate technical and scientific queries.
Loading preview...
Model Overview
This model, laion/nemosci-tasrep-a1mfc-dev1-maxeps-swes-r2eg-32b__Qwen3-32B, is a 32 billion parameter language model derived from the Qwen3-32B architecture. It has undergone extensive fine-tuning on a specialized collection of datasets, indicating a focus on enhancing its performance in particular domains.
Key Fine-tuning Datasets
The model was fine-tuned using several distinct datasets, suggesting an optimization for specific types of tasks:
nemotron-terminal-scientific_computing: Implies a focus on scientific computing and terminal interactions.exp_tas_repetition_penalty_1.05_traces: Suggests training to manage and reduce repetition in generated outputs.a1_multifile_composition: Indicates capabilities in handling and composing information from multiple files.exp_tas_max_episodes_512_traces: Points to training on agent-based task traces with a focus on episode management.dev_set_part1_10k_glm_4.7_traces_jupiter: Further reinforces training on agent traces, potentially from a development set.swesmith-sandboxes-with_tests-gpt-5-mini-passed_glm_4.7_traces: Suggests exposure to code sandboxes, tests, and potentially code generation or analysis.Kimi-2.5-r2egym_sandboxes-maxeps-32k: Reinforces training within sandbox environments, possibly for reinforcement learning or complex task execution.
Training Configuration
The fine-tuning process utilized a learning rate of 4e-05, a train_batch_size of 1, and a total_train_batch_size of 96 across 96 devices. The optimizer used was ADAMW_TORCH_FUSED with a cosine learning rate scheduler and 0.1 warmup ratio over 7 epochs. This configuration suggests a robust training regimen designed to adapt the base Qwen3-32B model to its specialized datasets.