laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg__Qwen3-8B
The laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg__Qwen3-8B is an 8 billion parameter language model, fine-tuned from the Qwen/Qwen3-8B architecture. This model has been specialized through training on a diverse collection of scientific computing, multi-file composition, and agent-based trace datasets. It is designed for tasks requiring advanced reasoning and problem-solving within complex computational environments, leveraging its 32768 token context length.
Loading preview...
Overview
This model, laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps-swes-r2eg__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen3-8B base architecture. It has undergone extensive fine-tuning on a specialized collection of datasets, including those focused on scientific computing, multi-file composition, and various agent-based traces. The model is configured with a substantial context length of 32768 tokens, enabling it to process and understand complex, lengthy inputs.
Key Capabilities
- Specialized Fine-tuning: Trained on datasets such as
nemotron-terminal-scientific_computing,exp_tas_repetition_penalty_1.05_traces,a1_multifile_composition,exp-gfi-staqc-embedding-mean-filtered-10K_glm_4.7_traces_jupiter,exp_tas_max_episodes_512_traces,dev_set_part1_10k_glm_4.7_traces_jupiter,a1_repo_scaffold,swesmith-sandboxes-with_tests-gpt-5-mini-passed_glm_4.7_traces, andKimi-2.5-r2egym_sandboxes-maxeps-32k. - Extended Context Window: Features a 32768 token context length, suitable for tasks requiring deep understanding of long documents or codebases.
Training Details
The model was trained using a learning rate of 4e-05, a total batch size of 96 (with 3 gradient accumulation steps across 32 GPUs), and a cosine learning rate scheduler with a 0.1 warmup ratio over 5 epochs. The optimizer used was ADAMW_TORCH_FUSED with default betas and epsilon.
Good For
- Scientific Computing: Tasks involving scientific data analysis, simulation, and problem-solving.
- Complex Code Generation & Analysis: Handling multi-file projects, repository scaffolding, and understanding intricate code structures.
- Agent-based Reasoning: Applications requiring the model to process and learn from agent interaction traces and decision-making processes.