Name: laion/100k_warmup0.05__Qwen3-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: laion

Model Overview

This model, laion/100k_warmup0.05__Qwen3-8B, is an 8 billion parameter language model derived from the Qwen/Qwen3-8B base architecture. It has undergone fine-tuning on a specialized and diverse collection of datasets, primarily focusing on various 'thinking_preprocessed' traces and 'Toolscale-tasks-traces'. This training regimen indicates a strong emphasis on developing advanced reasoning, problem-solving, and potentially tool-use capabilities.

Key Characteristics

Base Model: Qwen3-8B, a robust foundation for language understanding and generation.
Fine-tuning Data: Trained on multiple datasets like swesmith-sandboxes-with_tests-gpt-5-mini-passed_glm_4.7_traces, exp-uns-r2egym-16_8x_glm_4.7_traces_jupiter_cleaned, exp-syh-r2egym-askllm-hardened_glm_4.7_traces_jupiter, exp_tas_optimal_combined_traces, and glm46-Toolscale-tasks-traces. These datasets suggest a focus on complex task execution, logical reasoning, and interaction with external tools or environments.
Context Length: Features a substantial 32768 token context window, enabling the model to process and understand extensive inputs for intricate tasks.

Training Details

The fine-tuning process utilized a learning rate of 4e-05, a cosine learning rate scheduler with a 0.05 warmup ratio, and an AdamW optimizer. Training was conducted across 128 devices for 7 epochs, with a total effective batch size of 128.

Potential Use Cases

Given its specialized training, this model is likely well-suited for applications requiring:

Complex problem-solving and logical deduction.
Automated reasoning and decision-making systems.
Tasks involving tool integration or agent-like behaviors.
Processing and generating responses based on extensive contextual information.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)