laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 18, 2026License:otherArchitecture:Transformer Cold

The laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has been specialized through training on a diverse collection of scientific computing, multi-file composition, and agent-based trace datasets. It is designed for tasks requiring advanced reasoning and problem-solving within complex computational environments, leveraging its 32768 token context length.

Loading preview...

Overview

This model, laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen3-8B base architecture. It has undergone extensive fine-tuning on a specialized collection of datasets, including those focused on scientific computing, multi-file composition, and various agent-based traces. The model is configured with a substantial context length of 32768 tokens, enabling it to process and understand complex, lengthy inputs.

Key Capabilities

  • Specialized Fine-tuning: Trained on datasets such as nemotron-terminal-scientific_computing, exp_tas_repetition_penalty_1.05_traces, a1_multifile_composition, exp-gfi-staqc-embedding-mean-filtered-10K_glm_4.7_traces_jupiter, exp_tas_max_episodes_512_traces, dev_set_part1_10k_glm_4.7_traces_jupiter, a1_repo_scaffold, swesmith-sandboxes-with_tests-gpt-5-mini-passed_glm_4.7_traces, and Kimi-2.5-r2egym_sandboxes-maxeps-32k.
  • Extended Context Window: Features a 32768 token context length, suitable for tasks requiring deep understanding of long documents or codebases.

Training Details

The model was trained using a learning rate of 4e-05, a total batch size of 96 (with 3 gradient accumulation steps across 32 GPUs), and a cosine learning rate scheduler with a 0.1 warmup ratio over 5 epochs. The optimizer used was ADAMW_TORCH_FUSED with default betas and epsilon.

Good For

  • Scientific Computing: Tasks involving scientific data analysis, simulation, and problem-solving.
  • Complex Code Generation & Analysis: Handling multi-file projects, repository scaffolding, and understanding intricate code structures.
  • Agent-based Reasoning: Applications requiring the model to process and learn from agent interaction traces and decision-making processes.