Name: laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps__Qwen3-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: laion

Model Overview

This model, laion/nemosci-tasrep-a1mfc-gfistaqc-dev1-scaff-maxeps__Qwen3-8B, is an 8 billion parameter language model fine-tuned from the base Qwen3-8B architecture. It has been specifically adapted through training on a unique combination of datasets, including those focused on scientific computing, repetition penalty traces, multifile composition, GFI STAQC embedding traces, and various development set traces.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen3-8B.
Parameter Count: 8 billion parameters.
Context Length: Supports a context window of 32768 tokens.
Specialized Training Data: Trained on a diverse set of datasets including:
- laion/nemotron-terminal-scientific_computing
- DCAgent/exp_tas_repetition_penalty_1.05_traces
- DCAgent/a1_multifile_composition
- DCAgent/exp-gfi-staqc-embedding-mean-filtered-10K_glm_4.7_traces_jupiter
- DCAgent/exp_tas_max_episodes_512_traces
- DCAgent/dev_set_part1_10k_glm_4.7_traces_jupiter
- DCAgent/a1_repo_scaffold

Training Details

The model was trained with a learning rate of 4e-05, a batch size of 1 per device across 32 GPUs, and a total effective batch size of 96 with gradient accumulation. It utilized the AdamW_Torch_Fused optimizer with a cosine learning rate scheduler over 5 epochs.

Potential Use Cases

Given its specialized training, this model is likely well-suited for applications requiring:

Scientific Computing: Tasks involving scientific data analysis, simulation, or problem-solving.
Code Generation and Analysis: Handling complex code structures, multi-file projects, and potentially code refactoring or debugging.
Structured Data Processing: Tasks that benefit from understanding and generating structured outputs based on specific patterns found in its training data.
Agentic Workflows: Potentially useful in agent-based systems that require iterative refinement or structured responses, as suggested by the 'repetition penalty' and 'max episodes' datasets.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)