laion/nemosci-tasrep-a1mfc-gfistaqc-scaff-maxeps-swes-r2eg-32b__Qwen3-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Apr 21, 2026License:otherArchitecture:Transformer Cold

The laion/nemosci-tasrep-a1mfc-gfistaqc-scaff-maxeps-swes-r2eg-32b__Qwen3-32B model is a 32 billion parameter Qwen3-based language model fine-tuned by laion. It was trained on a diverse collection of scientific computing, repetition penalty, multifile composition, and code-related datasets. This model is specialized for tasks involving complex code generation, scientific problem-solving, and agentic reasoning within a 32768 token context window.

Loading preview...

Model Overview

This model, laion/nemosci-tasrep-a1mfc-gfistaqc-scaff-maxeps-swes-r2eg-32b__Qwen3-32B, is a fine-tuned variant of the Qwen3-32B architecture. It has been specifically adapted using a comprehensive set of datasets focused on advanced computational and reasoning tasks.

Key Fine-tuning Datasets

The model's training incorporated several specialized datasets, indicating an optimization for complex problem-solving and code-related applications:

  • Scientific Computing: From laion/nemotron-terminal-scientific_computing.
  • Repetition Penalty Traces: From DCAgent/exp_tas_repetition_penalty_1.05_traces.
  • Multifile Composition: From DCAgent/a1_multifile_composition.
  • GFI STAQC Embeddings: From DCAgent/exp-gfi-staqc-embedding-mean-filtered-10K_glm_4.7_traces_jupiter.
  • Max Episodes Traces: From DCAgent/exp_tas_max_episodes_512_traces.
  • Repo Scaffold: From DCAgent/a1_repo_scaffold.
  • SWESmith Sandboxes: From DCAgent/swesmith-sandboxes-with_tests-gpt-5-mini-passed_glm_4.7_traces.
  • Kimi-2.5-r2egym Sandboxes: From penfever/Kimi-2.5-r2egym_sandboxes-maxeps-32k.

Training Configuration

The model was trained with a learning rate of 4e-05, using an AdamW optimizer, and a cosine learning rate scheduler with a 0.1 warmup ratio over 7 epochs. It utilized a distributed training setup across 96 devices, resulting in a total batch size of 96.

Potential Use Cases

Given its specialized training, this model is likely well-suited for applications requiring advanced reasoning, scientific computation, complex code generation, and handling multi-file programming contexts.