laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps__Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 17, 2026License:otherArchitecture:Transformer Cold

The laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps__Qwen3-8B is an 8 billion parameter language model, fine-tuned from Qwen/Qwen3-8B. It was trained on a diverse collection of scientific computing, repetition penalty, multifile composition, and code-related datasets. This model is specialized for tasks involving complex code generation, scientific problem-solving, and handling structured data, leveraging its extensive training on specific technical datasets. Its 32768 token context window supports processing substantial technical inputs and generating detailed outputs.

Loading preview...

Model Overview

This model, laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps__Qwen3-8B, is an 8 billion parameter language model fine-tuned from the base Qwen3-8B architecture. It has been specifically adapted through training on a unique combination of datasets, including those focused on scientific computing, repetition penalty traces, multifile composition, GFI STAQC embedding traces, and various development set traces.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen3-8B.
  • Parameter Count: 8 billion parameters.
  • Context Length: Supports a context window of 32768 tokens.
  • Specialized Training Data: Trained on a diverse set of datasets including:
    • laion/nemotron-terminal-scientific_computing
    • DCAgent/exp_tas_repetition_penalty_1.05_traces
    • DCAgent/a1_multifile_composition
    • DCAgent/exp-gfi-staqc-embedding-mean-filtered-10K_glm_4.7_traces_jupiter
    • DCAgent/exp_tas_max_episodes_512_traces
    • DCAgent/dev_set_part1_10k_glm_4.7_traces_jupiter
    • DCAgent/a1_repo_scaffold

Training Details

The model was trained with a learning rate of 4e-05, a batch size of 1 per device across 32 GPUs, and a total effective batch size of 96 with gradient accumulation. It utilized the AdamW_Torch_Fused optimizer with a cosine learning rate scheduler over 5 epochs.

Potential Use Cases

Given its specialized training, this model is likely well-suited for applications requiring:

  • Scientific Computing: Tasks involving scientific data analysis, simulation, or problem-solving.
  • Code Generation and Analysis: Handling complex code structures, multi-file projects, and potentially code refactoring or debugging.
  • Structured Data Processing: Tasks that benefit from understanding and generating structured outputs based on specific patterns found in its training data.
  • Agentic Workflows: Potentially useful in agent-based systems that require iterative refinement or structured responses, as suggested by the 'repetition penalty' and 'max episodes' datasets.