laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg-32b-3pct__Qwen3-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Apr 21, 2026License:otherArchitecture:Transformer Cold

The laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg-32b-3pct__Qwen3-32B model is a 32 billion parameter language model, fine-tuned from Qwen/Qwen3-32B. It was trained on a diverse collection of specialized datasets, including those related to scientific computing, agent traces with repetition penalty, multifile composition, and scaffold generation. This fine-tuned model is optimized for tasks requiring complex reasoning and generation within these specific domains, leveraging its large parameter count and targeted training data.

Loading preview...

Model Overview

This model, laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg-32b-3pct__Qwen3-32B, is a 32 billion parameter language model derived from the Qwen3-32B architecture. It has undergone extensive fine-tuning on a unique combination of datasets, indicating a specialization in complex, multi-domain tasks.

Key Training Datasets

The model's training involved several distinct datasets, suggesting a focus on diverse and intricate problem-solving:

  • Scientific Computing: nemotron-terminal-scientific_computing-3pct
  • Agent Traces: exp_tas_repetition_penalty_1.05_traces-3pct, exp_tas_max_episodes_512_traces-3pct
  • Code & Composition: a1_multifile_composition-3pct, exp-gfi-staqc-embedding-mean-filtered-10K_glm_4.7_traces_jupiter-3pct, a1_repo_scaffold-3pct, swesmith-sandboxes-with_tests-gpt-5-mini-passed_glm_4.7_traces-3pct
  • R2E Gym Sandboxes: Kimi-2.5-r2egym_sandboxes-maxeps-32k-3pct

Training Configuration

Training was conducted with a learning rate of 4e-05 over 7 epochs, utilizing a distributed setup across 96 GPUs. The optimizer used was ADAMW_TORCH_FUSED with a cosine learning rate scheduler and a warmup ratio of 0.1. This configuration suggests a robust training process designed to leverage the large model size and diverse datasets effectively.